Baselight

Cyclistic

Bike share data cyclistic

@kaggle.salamibrahim_cyclistic

Loading...
Loading...

About this Dataset

Cyclistic

**Introduction **
This case study will be based on Cyclistic, a bike sharing company in Chicago. I will perform tasks of a junior data analyst to answer business questions. I will do this by following a process that includes the following phases: ask, prepare, process, analyze, share and act.

Background
Cyclistic is a bike sharing company that operates 5828 bikes within 692 docking stations. The company has been around since 2016 and separates itself from the competition due to the fact that they offer a variety of bike services including assistive options. Lily Moreno is the director of the marketing team and will be the person to receive these insights from this analysis.

Case Study and business task
Lily Morenos perspective on how to generate more income by marketing Cyclistics services correctly includes converting casual riders (one day passes and/or pay per ride customers) into annual riders with a membership. Annual riders are more profitable than casual riders according to the finance analysts. She would rather see a campaign targeting casual riders into annual riders, instead of launching campaigns targeting new costumers. So her strategy as the manager of the marketing team is simply to maximize the amount of annual riders by converting casual riders.

In order to make a data driven decision, Moreno needs the following insights:

  • A better understanding of how casual riders and annual riders differ
  • Why would a casual rider become an annual one
  • How digital media can affect the marketing tactics

Moreno has directed me to the first question - how do casual riders and annual riders differ?

Stakeholders
Lily Moreno, manager of the marketing team
Cyclistic Marketing team
Executive team

Data sources and organization
Data used in this report is made available and is licensed by Motivate International Inc. Personal data is hidden to protect personal information. Data used is from the past 12 months (01/04/2021 – 31/03/2022) of bike share dataset.

By merging all 12 monthly bike share data provided, an extensive amount of data with 5,400,000 rows were returned and included in this analysis.

Data security and limitations:
Personal information is secured and hidden to prevent unlawful use. Original files are backed up in folders and subfolders.

Tools and documentation of cleaning process
The tools used for data verification and data cleaning are Microsoft Excel and R programming. The original files made accessible by Motivate International Inc. are backed up in their original format and in separate files.

Microsoft Excel is used to generally look through the dataset and get a overview of the content. I performed simple checks of the data by filtering, sorting, formatting and standardizing the data to make it easily mergeable.. In Excel, I also changed data type to have the right format, removed unnecessary data if its incomplete or incorrect, created new columns to subtract and reformat existing columns and deleting empty cells. These tasks are easily done in spreadsheets and provides an initial cleaning process of the data.

R will be used to perform queries of bigger datasets such as this one. R will also be used to create visualizations to answer the question at hand.

Limitations
Microsoft Excel has a limitation of 1,048,576 rows while the data of the 12 months combined are over 5,500,000 rows. When combining the 12 months of data into one table/sheet, Excel is no longer efficient and I switched over to R programming.

Tables

N 202104 Divvy Tripdata

@kaggle.salamibrahim_cyclistic.n_202104_divvy_tripdata
  • 16.56 MB
  • 337230 rows
  • 13 columns
Loading...

CREATE TABLE n_202104_divvy_tripdata (
  "ride_id" VARCHAR,
  "rideable_type" VARCHAR,
  "started_at" TIMESTAMP,
  "ended_at" TIMESTAMP,
  "start_station_name" VARCHAR,
  "start_station_id" VARCHAR,
  "end_station_name" VARCHAR,
  "end_station_id" VARCHAR,
  "start_lat" DOUBLE,
  "start_lng" DOUBLE,
  "end_lat" DOUBLE,
  "end_lng" DOUBLE,
  "member_casual" VARCHAR
);

N 202105 Divvy Tripdata

@kaggle.salamibrahim_cyclistic.n_202105_divvy_tripdata
  • 25.97 MB
  • 531633 rows
  • 13 columns
Loading...

CREATE TABLE n_202105_divvy_tripdata (
  "ride_id" VARCHAR,
  "rideable_type" VARCHAR,
  "started_at" TIMESTAMP,
  "ended_at" TIMESTAMP,
  "start_station_name" VARCHAR,
  "start_station_id" VARCHAR,
  "end_station_name" VARCHAR,
  "end_station_id" VARCHAR,
  "start_lat" DOUBLE,
  "start_lng" DOUBLE,
  "end_lat" DOUBLE,
  "end_lng" DOUBLE,
  "member_casual" VARCHAR
);

N 202106 Divvy Tripdata

@kaggle.salamibrahim_cyclistic.n_202106_divvy_tripdata
  • 35.93 MB
  • 729595 rows
  • 13 columns
Loading...

CREATE TABLE n_202106_divvy_tripdata (
  "ride_id" VARCHAR,
  "rideable_type" VARCHAR,
  "started_at" TIMESTAMP,
  "ended_at" TIMESTAMP,
  "start_station_name" VARCHAR,
  "start_station_id" VARCHAR,
  "end_station_name" VARCHAR,
  "end_station_id" VARCHAR,
  "start_lat" DOUBLE,
  "start_lng" DOUBLE,
  "end_lat" DOUBLE,
  "end_lng" DOUBLE,
  "member_casual" VARCHAR
);

N 202107 Divvy Tripdata

@kaggle.salamibrahim_cyclistic.n_202107_divvy_tripdata
  • 39.71 MB
  • 822410 rows
  • 13 columns
Loading...

CREATE TABLE n_202107_divvy_tripdata (
  "ride_id" VARCHAR,
  "rideable_type" VARCHAR,
  "started_at" TIMESTAMP,
  "ended_at" TIMESTAMP,
  "start_station_name" VARCHAR,
  "start_station_id" VARCHAR,
  "end_station_name" VARCHAR,
  "end_station_id" VARCHAR,
  "start_lat" DOUBLE,
  "start_lng" DOUBLE,
  "end_lat" DOUBLE,
  "end_lng" DOUBLE,
  "member_casual" VARCHAR
);

N 202108 Divvy Tripdata

@kaggle.salamibrahim_cyclistic.n_202108_divvy_tripdata
  • 38.55 MB
  • 804352 rows
  • 13 columns
Loading...

CREATE TABLE n_202108_divvy_tripdata (
  "ride_id" VARCHAR,
  "rideable_type" VARCHAR,
  "started_at" TIMESTAMP,
  "ended_at" TIMESTAMP,
  "start_station_name" VARCHAR,
  "start_station_id" VARCHAR,
  "end_station_name" VARCHAR,
  "end_station_id" VARCHAR,
  "start_lat" DOUBLE,
  "start_lng" DOUBLE,
  "end_lat" DOUBLE,
  "end_lng" DOUBLE,
  "member_casual" VARCHAR
);

N 202109 Divvy Tripdata

@kaggle.salamibrahim_cyclistic.n_202109_divvy_tripdata
  • 36.98 MB
  • 756147 rows
  • 13 columns
Loading...

CREATE TABLE n_202109_divvy_tripdata (
  "ride_id" VARCHAR,
  "rideable_type" VARCHAR,
  "started_at" TIMESTAMP,
  "ended_at" TIMESTAMP,
  "start_station_name" VARCHAR,
  "start_station_id" VARCHAR,
  "end_station_name" VARCHAR,
  "end_station_id" VARCHAR,
  "start_lat" DOUBLE,
  "start_lng" DOUBLE,
  "end_lat" DOUBLE,
  "end_lng" DOUBLE,
  "member_casual" VARCHAR
);

N 202110 Divvy Tripdata

@kaggle.salamibrahim_cyclistic.n_202110_divvy_tripdata
  • 32.2 MB
  • 631226 rows
  • 13 columns
Loading...

CREATE TABLE n_202110_divvy_tripdata (
  "ride_id" VARCHAR,
  "rideable_type" VARCHAR,
  "started_at" TIMESTAMP,
  "ended_at" TIMESTAMP,
  "start_station_name" VARCHAR,
  "start_station_id" VARCHAR,
  "end_station_name" VARCHAR,
  "end_station_id" VARCHAR,
  "start_lat" DOUBLE,
  "start_lng" DOUBLE,
  "end_lat" DOUBLE,
  "end_lng" DOUBLE,
  "member_casual" VARCHAR
);

N 202111 Divvy Tripdata

@kaggle.salamibrahim_cyclistic.n_202111_divvy_tripdata
  • 19.25 MB
  • 359978 rows
  • 13 columns
Loading...

CREATE TABLE n_202111_divvy_tripdata (
  "ride_id" VARCHAR,
  "rideable_type" VARCHAR,
  "started_at" TIMESTAMP,
  "ended_at" TIMESTAMP,
  "start_station_name" VARCHAR,
  "start_station_id" VARCHAR,
  "end_station_name" VARCHAR,
  "end_station_id" VARCHAR,
  "start_lat" DOUBLE,
  "start_lng" DOUBLE,
  "end_lat" DOUBLE,
  "end_lng" DOUBLE,
  "member_casual" VARCHAR
);

N 202112 Divvy Tripdata

@kaggle.salamibrahim_cyclistic.n_202112_divvy_tripdata
  • 12.62 MB
  • 247540 rows
  • 13 columns
Loading...

CREATE TABLE n_202112_divvy_tripdata (
  "ride_id" VARCHAR,
  "rideable_type" VARCHAR,
  "started_at" TIMESTAMP,
  "ended_at" TIMESTAMP,
  "start_station_name" VARCHAR,
  "start_station_id" VARCHAR,
  "end_station_name" VARCHAR,
  "end_station_id" VARCHAR,
  "start_lat" DOUBLE,
  "start_lng" DOUBLE,
  "end_lat" DOUBLE,
  "end_lng" DOUBLE,
  "member_casual" VARCHAR
);

N 202201 Divvy Tripdata

@kaggle.salamibrahim_cyclistic.n_202201_divvy_tripdata
  • 5.28 MB
  • 103770 rows
  • 13 columns
Loading...

CREATE TABLE n_202201_divvy_tripdata (
  "ride_id" VARCHAR,
  "rideable_type" VARCHAR,
  "started_at" TIMESTAMP,
  "ended_at" TIMESTAMP,
  "start_station_name" VARCHAR,
  "start_station_id" VARCHAR,
  "end_station_name" VARCHAR,
  "end_station_id" VARCHAR,
  "start_lat" DOUBLE,
  "start_lng" DOUBLE,
  "end_lat" DOUBLE,
  "end_lng" DOUBLE,
  "member_casual" VARCHAR
);

N 202202 Divvy Tripdata

@kaggle.salamibrahim_cyclistic.n_202202_divvy_tripdata
  • 5.83 MB
  • 115609 rows
  • 13 columns
Loading...

CREATE TABLE n_202202_divvy_tripdata (
  "ride_id" VARCHAR,
  "rideable_type" VARCHAR,
  "started_at" TIMESTAMP,
  "ended_at" TIMESTAMP,
  "start_station_name" VARCHAR,
  "start_station_id" VARCHAR,
  "end_station_name" VARCHAR,
  "end_station_id" VARCHAR,
  "start_lat" DOUBLE,
  "start_lng" DOUBLE,
  "end_lat" DOUBLE,
  "end_lng" DOUBLE,
  "member_casual" VARCHAR
);

N 202203 Divvy Tripdata

@kaggle.salamibrahim_cyclistic.n_202203_divvy_tripdata
  • 13.58 MB
  • 284042 rows
  • 13 columns
Loading...

CREATE TABLE n_202203_divvy_tripdata (
  "ride_id" VARCHAR,
  "rideable_type" VARCHAR,
  "started_at" TIMESTAMP,
  "ended_at" TIMESTAMP,
  "start_station_name" VARCHAR,
  "start_station_id" VARCHAR,
  "end_station_name" VARCHAR,
  "end_station_id" VARCHAR,
  "start_lat" DOUBLE,
  "start_lng" DOUBLE,
  "end_lat" DOUBLE,
  "end_lng" DOUBLE,
  "member_casual" VARCHAR
);

Share link

Anyone who has the link will be able to view this.