NJ Transit + Amtrak (NEC) Rail Performance
Granular performance data from 150k+ NJ Transit and Amtrak train trips
@kaggle.pranavbadami_nj_transit_amtrak_nec_performance
Granular performance data from 150k+ NJ Transit and Amtrak train trips
@kaggle.pranavbadami_nj_transit_amtrak_nec_performance
NJ Transit is the second largest commuter rail network in the United States by ridership; it spans New Jersey and connects the state to New York City. On the Northeast Corridor, the busiest passenger rail line in the United States, Amtrak also operates passenger rail service; together, NJ Transit and Amtrak operate nearly 750 trains across the NJ Transit rail network.
Despite serving over 300,000 riders on the average weekday, no granular, trip-level performance data is publicly available for the NJ Transit rail network or Amtrak. This datasets aims to publicly provide such data.
This dataset contains monthly CSVs covering the performance of nearly every train trip on the NJ Transit rail network.
As of May 19, 2019:
Since February of 2018, I have been running a scraper that gathers stop-level, minute resolution data for NJ Transit and Amtrak train trips operating on the NJ Transit rail network. This scraper gathers data every minute from the NJ Transit DepartureVision Real Time Train Status service. The raw, timestamped train status pages are stored in a data lake and then parsed into tabular form; the parser is implemented as a state machine.
For more details on these processes and ancillary meta data (such as schedules and station locations) from the NJ Transit Developer Portal, check out the project GitHub repo.
Lots of interesting, high-impact projects could be driven by this data:
For some more inspiration, you can check out Medium articles written by Michael Zhang and me with this data:
A special thanks to Michael Zhang for his valuable work on using and preparing this data, as well as general support throughout the project.
Anyone who has the link will be able to view this.