Baselight

Intermediate Point Data (Taxi Trip Duration)

First 2000 rows with intermediate point data using Google Maps API

@kaggle.artimous_intermediate_point_data_taxi_trip_duration

About this Dataset

Intermediate Point Data (Taxi Trip Duration)

Context

Realising which routes a taxi takes while going from one location to another gives us deep insights into why some trips take longer than others. Also, most taxis rely on navigation from Google Maps, which reinforces the use case of this dataset. On a deeper look, we can begin to analyse patches of slow traffic and number of steps during the trip (explained below).

Content

The data, as we see it contains the following columns :

  • trip_id, pickup_latitude, pickup_longitude (and equivalents with dropoff) are picked up from the original dataset.
  • distance : Estimates the distance between the start and the end latitude, in miles.
  • start_address and end_address are directly picked up from the Google Maps API
  • params : Details set of parameters, flattened out into a single line. (Explained below)

Parameters

The parameters field is a long string of a flattened out JSON object. At its very basic, the field has space separated steps. The syntax is as follows :

Step1:{ ... }, Step2:{ ...

Each step denotes the presence of an intermediate point.

Inside the curly braces of each of the steps we have the distance for that step measured in ft, and the start and end location. The start and end location are surrounded by round braces and are in the following format :

Step1:{distance=X ft/mi start_location=(latitude, longitude) end_location ...}, ...

One can split the internal params over space to get all the required values.

Acknowledgements

All the credit for the data goes to the Google Maps API, though limited to 2000 queries per day. I believe that even that limited amount would help us gain great insights.

Future prospects

  • More data : Since the number of rows processed are just 2000, with a good response we might be able to get more. If you feel like contributing, please have a look at the script here and try and run in for the next 2000 rows.

  • Driver instructions : I did not include the driver instruction column in the data from the google API as it seemed to complex to use in any kind of models. If that is not the general opinion, I can add it here.

Share link

Anyone who has the link will be able to view this.