Cleaned & transformed Divvy bike-sharing trip data using Pandas and DuckDB
Dataset Description
📌 Overview
This dataset contains a cleaned and transformed version of the public
Divvy Bicycle Sharing Trip Data covering the period November 2024 to October 2025.
The original raw data is publicly released by the Chicago Open Data Portal,
and has been cleaned using Pandas (Python) and DuckDB SQL for faster analysis.
This dataset is now ready for direct use in:
- Exploratory Data Analysis (EDA)
- SQL analytics
- Machine learning
- Time-series/trend analysis
- Dashboard creation (Power BI / Tableau)
📂 Source
Original Data Provider:
Chicago Open Data Portal – Divvy Trips
License: Open Data Commons Public Domain Dedication (PDDL)
This cleaned dataset only contains transformations; no proprietary or restricted data is included.
🔧 Cleaning & Transformations Performed
- Combined monthly CSVs (Nov 2024 → Oct 2025)
- Removed duplicates
- Standardized datetime formats
- Created new fields:
ride_lengthday_of_weekhour_of_day
- Handled missing or null values
- Cleaned inconsistent station names
- Filtered invalid ride durations (negative or zero-length rides)
- Exported as a compressed
.csvfor optimized performance
📊 Columns in the Dataset
ride_idrideable_typestarted_atended_atstart_station_nameend_station_namestart_latstart_lngend_latend_lngmember_casualride_length(minutes)day_of_weekhour_of_day
💡 Use Cases
This dataset is suitable for:
- DuckDB + SQL analytics
- Pandas EDA
- Visualization in Power BI, Tableau, Looker
- Statistical analysis
- Member vs. Casual rider behavioral analysis
- Peak usage prediction
📝 Notes
This dataset is not the official Divvy dataset, but a cleaned, transformed, and analysis-ready version created for educational and analytical use.
Related Datasets
-
Accessibility - Toilets (timetable Change)
@swissopentransport
-
Dhds Dataset
@cdc
-
Divvy Bicycle Stations
@usgov
-
Accessibility - Toilets (today)
@swissopentransport