Baselight

NJ Transit + Amtrak (NEC) Rail Performance

Granular performance data from 150k+ NJ Transit and Amtrak train trips

@kaggle.pranavbadami_nj_transit_amtrak_nec_performance

Loading...
Loading...

About this Dataset

NJ Transit + Amtrak (NEC) Rail Performance

Context

NJ Transit is the second largest commuter rail network in the United States by ridership; it spans New Jersey and connects the state to New York City. On the Northeast Corridor, the busiest passenger rail line in the United States, Amtrak also operates passenger rail service; together, NJ Transit and Amtrak operate nearly 750 trains across the NJ Transit rail network.

Despite serving over 300,000 riders on the average weekday, no granular, trip-level performance data is publicly available for the NJ Transit rail network or Amtrak. This datasets aims to publicly provide such data.

Content

This dataset contains monthly CSVs covering the performance of nearly every train trip on the NJ Transit rail network.

As of May 19, 2019:

  • Stop-level, minute resolution data on 287,000+ train trips (248,000+ NJ Transit trips, 38,000+ Amtrak trips)
  • Coverage from March 1, 2018 to April 30, 2019 (updated monthly)
  • Transparent reporting on train trips for which data was missing/invalid, or that were scraped or parsed incorrectly (97.5% of train trips were correctly captured)

Since February of 2018, I have been running a scraper that gathers stop-level, minute resolution data for NJ Transit and Amtrak train trips operating on the NJ Transit rail network. This scraper gathers data every minute from the NJ Transit DepartureVision Real Time Train Status service. The raw, timestamped train status pages are stored in a data lake and then parsed into tabular form; the parser is implemented as a state machine.

For more details on these processes and ancillary meta data (such as schedules and station locations) from the NJ Transit Developer Portal, check out the project GitHub repo.

Inspiration

Lots of interesting, high-impact projects could be driven by this data:

  • Robust prediction: This data could be used to derive a system-level prediction system for the NJ Transit network. Such a system could provide intelligent, targeted advance warnings of delays or cancellations for millions of riders.
  • Combining datasets: Weather data and service alert data could be incorporated to look at the effect of weather events and analyze the impacts of specific kinds of service interruptions.
  • Data visualization: Visualizing this data could provide robust insight into the system-level mechanics of the NJ Transit rail network, as well as more engaging reporting on NJ Transit.

For some more inspiration, you can check out Medium articles written by Michael Zhang and me with this data:

  1. The 5 Stages of a System Breakdown on NJ Transit
  2. What are the chances that NJ Transit will cause you to miss the Dinky?
  3. How data can help fix NJ Transit

Acknowledgements

A special thanks to Michael Zhang for his valuable work on using and preparing this data, as well as general support throughout the project.

Tables

N 2018–03

@kaggle.pranavbadami_nj_transit_amtrak_nec_performance.n_2018_03
  • 3.32 MB
  • 256508 rows
  • 13 columns
Loading...

CREATE TABLE n_2018_03 (
  "date" TIMESTAMP,
  "train_id" VARCHAR,
  "stop_sequence" DOUBLE,
  "from" VARCHAR,
  "from_id" BIGINT,
  "to" VARCHAR,
  "to_id" BIGINT,
  "scheduled_time" TIMESTAMP,
  "actual_time" TIMESTAMP,
  "delay_minutes" DOUBLE,
  "status" VARCHAR,
  "line" VARCHAR,
  "type" VARCHAR
);

N 2018–04

@kaggle.pranavbadami_nj_transit_amtrak_nec_performance.n_2018_04
  • 3.18 MB
  • 256267 rows
  • 13 columns
Loading...

CREATE TABLE n_2018_04 (
  "date" TIMESTAMP,
  "train_id" VARCHAR,
  "stop_sequence" DOUBLE,
  "from" VARCHAR,
  "from_id" BIGINT,
  "to" VARCHAR,
  "to_id" BIGINT,
  "scheduled_time" TIMESTAMP,
  "actual_time" TIMESTAMP,
  "delay_minutes" DOUBLE,
  "status" VARCHAR,
  "line" VARCHAR,
  "type" VARCHAR
);

N 2018–05

@kaggle.pranavbadami_nj_transit_amtrak_nec_performance.n_2018_05
  • 3.39 MB
  • 266837 rows
  • 13 columns
Loading...

CREATE TABLE n_2018_05 (
  "date" TIMESTAMP,
  "train_id" VARCHAR,
  "stop_sequence" DOUBLE,
  "from" VARCHAR,
  "from_id" DOUBLE,
  "to" VARCHAR,
  "to_id" DOUBLE,
  "scheduled_time" TIMESTAMP,
  "actual_time" TIMESTAMP,
  "delay_minutes" DOUBLE,
  "status" VARCHAR,
  "line" VARCHAR,
  "type" VARCHAR
);

N 2018–06

@kaggle.pranavbadami_nj_transit_amtrak_nec_performance.n_2018_06
  • 3.23 MB
  • 253952 rows
  • 13 columns
Loading...

CREATE TABLE n_2018_06 (
  "date" TIMESTAMP,
  "train_id" VARCHAR,
  "stop_sequence" DOUBLE,
  "from" VARCHAR,
  "from_id" DOUBLE,
  "to" VARCHAR,
  "to_id" DOUBLE,
  "scheduled_time" TIMESTAMP,
  "actual_time" TIMESTAMP,
  "delay_minutes" DOUBLE,
  "status" VARCHAR,
  "line" VARCHAR,
  "type" VARCHAR
);

N 2018–07

@kaggle.pranavbadami_nj_transit_amtrak_nec_performance.n_2018_07
  • 3.33 MB
  • 261210 rows
  • 13 columns
Loading...

CREATE TABLE n_2018_07 (
  "date" TIMESTAMP,
  "train_id" VARCHAR,
  "stop_sequence" DOUBLE,
  "from" VARCHAR,
  "from_id" DOUBLE,
  "to" VARCHAR,
  "to_id" DOUBLE,
  "scheduled_time" TIMESTAMP,
  "actual_time" TIMESTAMP,
  "delay_minutes" DOUBLE,
  "status" VARCHAR,
  "line" VARCHAR,
  "type" VARCHAR
);

N 2018–08

@kaggle.pranavbadami_nj_transit_amtrak_nec_performance.n_2018_08
  • 3.45 MB
  • 269140 rows
  • 13 columns
Loading...

CREATE TABLE n_2018_08 (
  "date" TIMESTAMP,
  "train_id" VARCHAR,
  "stop_sequence" DOUBLE,
  "from" VARCHAR,
  "from_id" DOUBLE,
  "to" VARCHAR,
  "to_id" DOUBLE,
  "scheduled_time" TIMESTAMP,
  "actual_time" TIMESTAMP,
  "delay_minutes" DOUBLE,
  "status" VARCHAR,
  "line" VARCHAR,
  "type" VARCHAR
);

N 2018–09

@kaggle.pranavbadami_nj_transit_amtrak_nec_performance.n_2018_09
  • 3.06 MB
  • 239953 rows
  • 13 columns
Loading...

CREATE TABLE n_2018_09 (
  "date" TIMESTAMP,
  "train_id" VARCHAR,
  "stop_sequence" DOUBLE,
  "from" VARCHAR,
  "from_id" DOUBLE,
  "to" VARCHAR,
  "to_id" DOUBLE,
  "scheduled_time" TIMESTAMP,
  "actual_time" TIMESTAMP,
  "delay_minutes" DOUBLE,
  "status" VARCHAR,
  "line" VARCHAR,
  "type" VARCHAR
);

N 2018–10

@kaggle.pranavbadami_nj_transit_amtrak_nec_performance.n_2018_10
  • 3.12 MB
  • 251566 rows
  • 13 columns
Loading...

CREATE TABLE n_2018_10 (
  "date" TIMESTAMP,
  "train_id" VARCHAR,
  "stop_sequence" DOUBLE,
  "from" VARCHAR,
  "from_id" DOUBLE,
  "to" VARCHAR,
  "to_id" DOUBLE,
  "scheduled_time" TIMESTAMP,
  "actual_time" TIMESTAMP,
  "delay_minutes" DOUBLE,
  "status" VARCHAR,
  "line" VARCHAR,
  "type" VARCHAR
);

N 2018–11

@kaggle.pranavbadami_nj_transit_amtrak_nec_performance.n_2018_11
  • 3.02 MB
  • 236751 rows
  • 13 columns
Loading...

CREATE TABLE n_2018_11 (
  "date" TIMESTAMP,
  "train_id" VARCHAR,
  "stop_sequence" DOUBLE,
  "from" VARCHAR,
  "from_id" DOUBLE,
  "to" VARCHAR,
  "to_id" DOUBLE,
  "scheduled_time" TIMESTAMP,
  "actual_time" TIMESTAMP,
  "delay_minutes" DOUBLE,
  "status" VARCHAR,
  "line" VARCHAR,
  "type" VARCHAR
);

N 2018–12

@kaggle.pranavbadami_nj_transit_amtrak_nec_performance.n_2018_12
  • 2.69 MB
  • 209074 rows
  • 13 columns
Loading...

CREATE TABLE n_2018_12 (
  "date" TIMESTAMP,
  "train_id" VARCHAR,
  "stop_sequence" DOUBLE,
  "from" VARCHAR,
  "from_id" BIGINT,
  "to" VARCHAR,
  "to_id" BIGINT,
  "scheduled_time" TIMESTAMP,
  "actual_time" TIMESTAMP,
  "delay_minutes" DOUBLE,
  "status" VARCHAR,
  "line" VARCHAR,
  "type" VARCHAR
);

N 2019–01

@kaggle.pranavbadami_nj_transit_amtrak_nec_performance.n_2019_01
  • 3.13 MB
  • 233958 rows
  • 13 columns
Loading...

CREATE TABLE n_2019_01 (
  "date" TIMESTAMP,
  "train_id" VARCHAR,
  "stop_sequence" DOUBLE,
  "from" VARCHAR,
  "from_id" BIGINT,
  "to" VARCHAR,
  "to_id" BIGINT,
  "scheduled_time" TIMESTAMP,
  "actual_time" TIMESTAMP,
  "delay_minutes" DOUBLE,
  "status" VARCHAR,
  "line" VARCHAR,
  "type" VARCHAR
);

N 2019–02

@kaggle.pranavbadami_nj_transit_amtrak_nec_performance.n_2019_02
  • 2.91 MB
  • 216055 rows
  • 13 columns
Loading...

CREATE TABLE n_2019_02 (
  "date" TIMESTAMP,
  "train_id" VARCHAR,
  "stop_sequence" DOUBLE,
  "from" VARCHAR,
  "from_id" DOUBLE,
  "to" VARCHAR,
  "to_id" DOUBLE,
  "scheduled_time" TIMESTAMP,
  "actual_time" TIMESTAMP,
  "delay_minutes" DOUBLE,
  "status" VARCHAR,
  "line" VARCHAR,
  "type" VARCHAR
);

N 2019–03

@kaggle.pranavbadami_nj_transit_amtrak_nec_performance.n_2019_03
  • 3.07 MB
  • 239038 rows
  • 13 columns
Loading...

CREATE TABLE n_2019_03 (
  "date" TIMESTAMP,
  "train_id" VARCHAR,
  "stop_sequence" DOUBLE,
  "from" VARCHAR,
  "from_id" DOUBLE,
  "to" VARCHAR,
  "to_id" DOUBLE,
  "scheduled_time" TIMESTAMP,
  "actual_time" TIMESTAMP,
  "delay_minutes" DOUBLE,
  "status" VARCHAR,
  "line" VARCHAR,
  "type" VARCHAR
);

N 2019–04

@kaggle.pranavbadami_nj_transit_amtrak_nec_performance.n_2019_04
  • 3.02 MB
  • 238693 rows
  • 13 columns
Loading...

CREATE TABLE n_2019_04 (
  "date" TIMESTAMP,
  "train_id" VARCHAR,
  "stop_sequence" DOUBLE,
  "from" VARCHAR,
  "from_id" BIGINT,
  "to" VARCHAR,
  "to_id" BIGINT,
  "scheduled_time" TIMESTAMP,
  "actual_time" TIMESTAMP,
  "delay_minutes" DOUBLE,
  "status" VARCHAR,
  "line" VARCHAR,
  "type" VARCHAR
);

N 2019–05

@kaggle.pranavbadami_nj_transit_amtrak_nec_performance.n_2019_05
  • 3.15 MB
  • 250621 rows
  • 13 columns
Loading...

CREATE TABLE n_2019_05 (
  "date" TIMESTAMP,
  "train_id" VARCHAR,
  "stop_sequence" DOUBLE,
  "from" VARCHAR,
  "from_id" DOUBLE,
  "to" VARCHAR,
  "to_id" DOUBLE,
  "scheduled_time" TIMESTAMP,
  "actual_time" TIMESTAMP,
  "delay_minutes" DOUBLE,
  "status" VARCHAR,
  "line" VARCHAR,
  "type" VARCHAR
);

N 2019–06

@kaggle.pranavbadami_nj_transit_amtrak_nec_performance.n_2019_06
  • 3.04 MB
  • 238759 rows
  • 13 columns
Loading...

CREATE TABLE n_2019_06 (
  "date" TIMESTAMP,
  "train_id" VARCHAR,
  "stop_sequence" DOUBLE,
  "from" VARCHAR,
  "from_id" DOUBLE,
  "to" VARCHAR,
  "to_id" DOUBLE,
  "scheduled_time" TIMESTAMP,
  "actual_time" TIMESTAMP,
  "delay_minutes" DOUBLE,
  "status" VARCHAR,
  "line" VARCHAR,
  "type" VARCHAR
);

N 2019–07

@kaggle.pranavbadami_nj_transit_amtrak_nec_performance.n_2019_07
  • 3.23 MB
  • 246292 rows
  • 13 columns
Loading...

CREATE TABLE n_2019_07 (
  "date" TIMESTAMP,
  "train_id" VARCHAR,
  "stop_sequence" DOUBLE,
  "from" VARCHAR,
  "from_id" DOUBLE,
  "to" VARCHAR,
  "to_id" DOUBLE,
  "scheduled_time" TIMESTAMP,
  "actual_time" TIMESTAMP,
  "delay_minutes" DOUBLE,
  "status" VARCHAR,
  "line" VARCHAR,
  "type" VARCHAR
);

N 2019–08

@kaggle.pranavbadami_nj_transit_amtrak_nec_performance.n_2019_08
  • 3.08 MB
  • 241700 rows
  • 13 columns
Loading...

CREATE TABLE n_2019_08 (
  "date" TIMESTAMP,
  "train_id" VARCHAR,
  "stop_sequence" DOUBLE,
  "from" VARCHAR,
  "from_id" DOUBLE,
  "to" VARCHAR,
  "to_id" DOUBLE,
  "scheduled_time" TIMESTAMP,
  "actual_time" TIMESTAMP,
  "delay_minutes" DOUBLE,
  "status" VARCHAR,
  "line" VARCHAR,
  "type" VARCHAR
);

N 2019–09

@kaggle.pranavbadami_nj_transit_amtrak_nec_performance.n_2019_09
  • 3.05 MB
  • 234628 rows
  • 13 columns
Loading...

CREATE TABLE n_2019_09 (
  "date" TIMESTAMP,
  "train_id" VARCHAR,
  "stop_sequence" DOUBLE,
  "from" VARCHAR,
  "from_id" DOUBLE,
  "to" VARCHAR,
  "to_id" DOUBLE,
  "scheduled_time" TIMESTAMP,
  "actual_time" TIMESTAMP,
  "delay_minutes" DOUBLE,
  "status" VARCHAR,
  "line" VARCHAR,
  "type" VARCHAR
);

N 2019–10

@kaggle.pranavbadami_nj_transit_amtrak_nec_performance.n_2019_10
  • 3.33 MB
  • 254751 rows
  • 13 columns
Loading...

CREATE TABLE n_2019_10 (
  "date" TIMESTAMP,
  "train_id" VARCHAR,
  "stop_sequence" DOUBLE,
  "from" VARCHAR,
  "from_id" DOUBLE,
  "to" VARCHAR,
  "to_id" DOUBLE,
  "scheduled_time" TIMESTAMP,
  "actual_time" TIMESTAMP,
  "delay_minutes" DOUBLE,
  "status" VARCHAR,
  "line" VARCHAR,
  "type" VARCHAR
);

N 2019–11

@kaggle.pranavbadami_nj_transit_amtrak_nec_performance.n_2019_11
  • 3.14 MB
  • 239030 rows
  • 13 columns
Loading...

CREATE TABLE n_2019_11 (
  "date" TIMESTAMP,
  "train_id" VARCHAR,
  "stop_sequence" DOUBLE,
  "from" VARCHAR,
  "from_id" DOUBLE,
  "to" VARCHAR,
  "to_id" DOUBLE,
  "scheduled_time" TIMESTAMP,
  "actual_time" TIMESTAMP,
  "delay_minutes" DOUBLE,
  "status" VARCHAR,
  "line" VARCHAR,
  "type" VARCHAR
);

N 2019–12

@kaggle.pranavbadami_nj_transit_amtrak_nec_performance.n_2019_12
  • 3.16 MB
  • 246063 rows
  • 13 columns
Loading...

CREATE TABLE n_2019_12 (
  "date" TIMESTAMP,
  "train_id" VARCHAR,
  "stop_sequence" DOUBLE,
  "from" VARCHAR,
  "from_id" DOUBLE,
  "to" VARCHAR,
  "to_id" DOUBLE,
  "scheduled_time" TIMESTAMP,
  "actual_time" TIMESTAMP,
  "delay_minutes" DOUBLE,
  "status" VARCHAR,
  "line" VARCHAR,
  "type" VARCHAR
);

N 2020–01

@kaggle.pranavbadami_nj_transit_amtrak_nec_performance.n_2020_01
  • 3.13 MB
  • 247916 rows
  • 13 columns
Loading...

CREATE TABLE n_2020_01 (
  "date" TIMESTAMP,
  "train_id" VARCHAR,
  "stop_sequence" DOUBLE,
  "from" VARCHAR,
  "from_id" DOUBLE,
  "to" VARCHAR,
  "to_id" DOUBLE,
  "scheduled_time" TIMESTAMP,
  "actual_time" TIMESTAMP,
  "delay_minutes" DOUBLE,
  "status" VARCHAR,
  "line" VARCHAR,
  "type" VARCHAR
);

N 2020–02

@kaggle.pranavbadami_nj_transit_amtrak_nec_performance.n_2020_02
  • 3.04 MB
  • 228571 rows
  • 13 columns
Loading...

CREATE TABLE n_2020_02 (
  "date" TIMESTAMP,
  "train_id" VARCHAR,
  "stop_sequence" DOUBLE,
  "from" VARCHAR,
  "from_id" DOUBLE,
  "to" VARCHAR,
  "to_id" DOUBLE,
  "scheduled_time" TIMESTAMP,
  "actual_time" TIMESTAMP,
  "delay_minutes" DOUBLE,
  "status" VARCHAR,
  "line" VARCHAR,
  "type" VARCHAR
);

N 2020–03

@kaggle.pranavbadami_nj_transit_amtrak_nec_performance.n_2020_03
  • 2.88 MB
  • 222760 rows
  • 13 columns
Loading...

CREATE TABLE n_2020_03 (
  "date" TIMESTAMP,
  "train_id" VARCHAR,
  "stop_sequence" DOUBLE,
  "from" VARCHAR,
  "from_id" BIGINT,
  "to" VARCHAR,
  "to_id" BIGINT,
  "scheduled_time" TIMESTAMP,
  "actual_time" TIMESTAMP,
  "delay_minutes" DOUBLE,
  "status" VARCHAR,
  "line" VARCHAR,
  "type" VARCHAR
);

N 2020–04

@kaggle.pranavbadami_nj_transit_amtrak_nec_performance.n_2020_04
  • 2.11 MB
  • 167623 rows
  • 13 columns
Loading...

CREATE TABLE n_2020_04 (
  "date" TIMESTAMP,
  "train_id" VARCHAR,
  "stop_sequence" DOUBLE,
  "from" VARCHAR,
  "from_id" BIGINT,
  "to" VARCHAR,
  "to_id" BIGINT,
  "scheduled_time" TIMESTAMP,
  "actual_time" TIMESTAMP,
  "delay_minutes" DOUBLE,
  "status" VARCHAR,
  "line" VARCHAR,
  "type" VARCHAR
);

N 2020–05

@kaggle.pranavbadami_nj_transit_amtrak_nec_performance.n_2020_05
  • 1.23 MB
  • 98698 rows
  • 13 columns
Loading...

CREATE TABLE n_2020_05 (
  "date" TIMESTAMP,
  "train_id" VARCHAR,
  "stop_sequence" DOUBLE,
  "from" VARCHAR,
  "from_id" BIGINT,
  "to" VARCHAR,
  "to_id" BIGINT,
  "scheduled_time" TIMESTAMP,
  "actual_time" TIMESTAMP,
  "delay_minutes" DOUBLE,
  "status" VARCHAR,
  "line" VARCHAR,
  "type" VARCHAR
);

Invalid Trains

@kaggle.pranavbadami_nj_transit_amtrak_nec_performance.invalid_trains
  • 23.56 KB
  • 7315 rows
  • 3 columns
Loading...

CREATE TABLE invalid_trains (
  "date" VARCHAR,
  "train_id" VARCHAR,
  "reason" VARCHAR
);

Invalid Trains 05–01–19–05–18–20

@kaggle.pranavbadami_nj_transit_amtrak_nec_performance.invalid_trains_05_01_19_05_18_20
  • 31.63 KB
  • 18068 rows
  • 3 columns
Loading...

CREATE TABLE invalid_trains_05_01_19_05_18_20 (
  "date" VARCHAR,
  "train_id" VARCHAR,
  "reason" VARCHAR
);

Share link

Anyone who has the link will be able to view this.