Baselight

Divvy Trips Clean Dataset (Nov 2024 – Oct 2025)

Cleaned & transformed Divvy bike-sharing trip data using Pandas and DuckDB

@kaggle.yeshangupadhyay_divvy_trips_clean_dataset_nov_2024_oct_2025

Loading...
Loading...

About this Dataset

Divvy Trips Clean Dataset (Nov 2024 – Oct 2025)

📌 Overview

This dataset contains a cleaned and transformed version of the public
Divvy Bicycle Sharing Trip Data covering the period November 2024 to October 2025.

The original raw data is publicly released by the Chicago Open Data Portal,
and has been cleaned using Pandas (Python) and DuckDB SQL for faster analysis.
This dataset is now ready for direct use in:

  • Exploratory Data Analysis (EDA)
  • SQL analytics
  • Machine learning
  • Time-series/trend analysis
  • Dashboard creation (Power BI / Tableau)

📂 Source

Original Data Provider:
Chicago Open Data Portal – Divvy Trips
License: Open Data Commons Public Domain Dedication (PDDL)
This cleaned dataset only contains transformations; no proprietary or restricted data is included.


🔧 Cleaning & Transformations Performed

  • Combined monthly CSVs (Nov 2024 → Oct 2025)
  • Removed duplicates
  • Standardized datetime formats
  • Created new fields:
    • ride_length
    • day_of_week
    • hour_of_day
  • Handled missing or null values
  • Cleaned inconsistent station names
  • Filtered invalid ride durations (negative or zero-length rides)
  • Exported as a compressed .csv for optimized performance

📊 Columns in the Dataset

  • ride_id
  • rideable_type
  • started_at
  • ended_at
  • start_station_name
  • end_station_name
  • start_lat
  • start_lng
  • end_lat
  • end_lng
  • member_casual
  • ride_length (minutes)
  • day_of_week
  • hour_of_day

💡 Use Cases

This dataset is suitable for:

  • DuckDB + SQL analytics
  • Pandas EDA
  • Visualization in Power BI, Tableau, Looker
  • Statistical analysis
  • Member vs. Casual rider behavioral analysis
  • Peak usage prediction

📝 Notes

This dataset is not the official Divvy dataset, but a cleaned, transformed, and analysis-ready version created for educational and analytical use.

Tables

Trips Clean

@kaggle.yeshangupadhyay_divvy_trips_clean_dataset_nov_2024_oct_2025.trips_clean
  • 193.99 MB
  • 3,738,761 rows
  • 16 columns
Loading...
CREATE TABLE trips_clean (
  "ride_id" VARCHAR,
  "rideable_type" VARCHAR,
  "started_at" TIMESTAMP,
  "ended_at" TIMESTAMP,
  "start_station_name" VARCHAR,
  "start_station_id" VARCHAR,
  "end_station_name" VARCHAR,
  "end_station_id" VARCHAR,
  "start_lat" DOUBLE,
  "start_lng" DOUBLE,
  "end_lat" DOUBLE,
  "end_lng" DOUBLE,
  "member_casual" VARCHAR,
  "ride_length" DOUBLE,
  "day_of_week" VARCHAR,
  "hour_of_day" BIGINT
);

Share link

Anyone who has the link will be able to view this.