Train Stations In Europe
Names, Coordinates, and Properties of European Railway Stations
@kaggle.headsortails_train_stations_in_europe
Names, Coordinates, and Properties of European Railway Stations
@kaggle.headsortails_train_stations_in_europe
Many European countries possess an extensive net of public transport railroads which connect large and small cities. This dataset contains the names, coordinates, and basic properties of more than 36000 train stations in (and adjacent to) Europe. It was derived from data provided by the Trainline EU ticketing website that has been published on github. I will update this dataset regularly.
Note, that the data contains a few train stations in the European parts of Russia and Turkey, as well as a small number of stations in the African country of Morocco.
The dataset train_stations_europe.csv
is based on the Trainline EU github repo. It contains 36k+ stations at the time of creation of this Kaggle Dataset. The github dataset contains many more columns, most of which are covering operator-specific properties (e.g. Renfe or Trenitalia) or translations into different languages (most of which are missing, though). I decided to extract this subset to provide a more focussed and complete data source.
Most of those descriptions have been taken verbatim from the github repo. I have added some extra context info or explanations where I felt they were necessary. Note, that some columns contain a significant percentage of NA values.
id
: Numeric internal unique identifier. Primary key.
name
: Name of the station as it is locally known. These names include accents and other special characters.
name_norm
: Normalised version of name
; transformed into [A-Za-z] character space (aka 'Latin-ASCII') to replace special characters with their standard-Latin counterparts (e.g. è become e, ü becomes u).
uic
: The UIC code of the station. UIC is the International Union of Railways, "an international rail transport industry body". About 1/3 of all stations have no UIC code in this dataset.
longitude
& latitude
: Station coordinates. About 5% of all stations have no coordinates in this dataset.
parent_station_id
: A station can belong to a meta station whose id
is this value, i.e. Paris Gare d’Austerlitz (id = 4921
) belongs to the meta-station Paris (id = 4916
). About 92% of rows have NA entries.
country
: Country codes in ISO 3166-1 alpha-2 format (2 digits).
time_zone
: Continent/Country ISO codes. Those appear to be equivalent to Olson names (e.g. "Europe/Berlin").
is_city
: Marked as "unreliable" in the source dataset. Might be worth investigating what exactly that means.
is_main_station
: Marked as "unreliable" in the source dataset. Might be worth investigating what exactly that means.
All credit for creating this dataset and providing the public version goes to the Trainline EU team.
Banner and vignette photo by Michał Parzuchowski on Unsplash.
Data is distributed under the Open Database License (ODbL) licence, see here. In short, any modification to this data source must be published.
CREATE TABLE train_stations_europe (
"id" BIGINT,
"name" VARCHAR,
"name_norm" VARCHAR,
"uic" DOUBLE,
"latitude" DOUBLE,
"longitude" DOUBLE,
"parent_station_id" DOUBLE,
"country" VARCHAR,
"time_zone" VARCHAR,
"is_city" BOOLEAN,
"is_main_station" BOOLEAN,
"is_airport" BOOLEAN,
"entur_id" VARCHAR,
"entur_is_enabled" BOOLEAN
);
Anyone who has the link will be able to view this.