Baselight

Cyclistic Bikes Case Study

Google Data Analytics Capstone

@kaggle.swilliams2070_cyclistic_bikes_case_study

About this Dataset

Cyclistic Bikes Case Study

Cyclistic Bikes, Case study with SQL and Tableau.

Scenario.

As a junior data analyst working in the marketing analyst team at Cyclistic, a bike-share company in Chicago. The director of marketing believes the company’s future success depends on maximizing the number of annual memberships. Therefore, your team wants to understand how casual riders and annual members use Cyclistic bikes differently. From these insights, your team will design a new marketing strategy to convert casual riders into annual members.

About the Company.

In 2016, Cyclistic launched a successful bike-share offering. Since then, the program has grown to a fleet of 5,824 bicycles that are geotracked and locked into a network of 692 stations across Chicago. The bikes can be unlocked from one station and returned to any other station in the system anytime.
A bike-share program that features more than 5,800 bicycles and 600 docking stations. Cyclistic sets itself apart by also offering reclining bikes, hand tricycles, and cargo bikes, making bike-share more inclusive to people with disabilities and riders who can’t use a standard two-wheeled bike. The majority of riders opt for traditional bikes; about 8% of riders use the assistive options. Cyclistic users are more likely to ride for leisure, but about 30% use them to commute to work each day

Stakeholders

Lily Moreno Director of Marketing and Cyclistic marketing analytics team.

Business Task.

Design marketing strategies aimed at converting casual riders into annual members

  1. How do annual members and casual riders use Cyclistic bikes differently?
  2. Why would casual riders buy Cyclistic annual memberships?
  3. How can Cyclistic use digital media to influence casual riders to become members?

Data Source

Data has been made available by Motivate International Inc under this licence which can be found at https://ride.divvybikes.com/data-license-agreement

Summary, Cyclistic Bike Share

Summary WINTER 01-03

Lowest number of casual riders for the year are in winter 97,382 casual riders.
Number of lowest daily casual riders recorded between Monday and Friday 7,435 down from 144,144 riders in summer.
Lowest number of casual riders on the weekend, Sunday 19,236 down from 219,118 riders in summer.
Three times more member riders than casual riders between Monday and Friday during Winter.
Casual riders spend double the amount of time on bikes than member riders all year round, average time on bikes ranges between 18 to 36 minutes.
Peak time for casual riders is 17:00 hrs during the week and at the weekend 14:00 hrs

Summary SPRING 04-06

An increase of 532,081 spring casual riders, 629463 is a 546.39% increase from 97,382 casual winter riders.
More spring casual riders than member riders at the weekend during spring season. Saturday(141,850) and Sunday(133,125)
Average time on bikes for casual riders ranges between 27 to 36 minutes.
Peak time for spring casual riders is 17:00 hrs Monday, Tuesday, Wednesday and Friday. Peak time on Thursday is 18:00 hrs. Peak time for casual riders on Saturday is 16:00 hrs on Sunday 14:00 hrs.

Summary SUMMER 07-09

An increase of 356,274 summer casual riders, 985,737 is a 56.6% increase from 629463 casual spring riders.
More summer casual riders than member riders at the weekend during summer season.
Saturday (219,118) an increase of 54.472% and Sunday (186,871) an increase of 40.373
Lower average time on bikes for summer casual riders ranges between 24 to 31 minutes.
Peak time for summer casual riders is 17:00 hrs during the week, 14:00 hrs on Saturday and 15:00 hrs Sunday.

Summary AUTUMN 10-12

A decrease of 687,123 casual riders, 298614 is a 69.707% decrease from 985737 casual summer riders.
Less autumn casual riders than member riders at the weekend during autumn season.
Saturday (72,616) a decrease of 66.86% and Sunday (56,990) a decrease of 69.503%
Average time on bikes for autumn casual riders ranges between 18 to 27 minutes.
Peak time for autumn casual riders is 17:00 hrs during the week, 13:00 hrs on Saturday and 15:00 hrs Sunday.

Rideable Type Summary

Only 1 member used a Docked Bike.

Top 10 Stations Summary

Data Cleaning & Manipulation – SQL

Join datasets divvy_01_raw, divvy_02_raw and divvy_03_raw with SQL

SELECT ride_id, rideable_type, started_at, ended_at, start_station_name, start_station_id, end_station_name, end_station_id, start_lat, start_lng, end_lat, end_lng, member_casual
FROM custom-bond-345614.CityBikes_CaseStudy.divvy_01_raw
UNION ALL
SELECT ride_id, rideable_type, started_at, ended_at, start_station_name, start_station_id, end_station_name, end_station_id, start_lat, start_lng, end_lat, end_lng, member_casual
FROM custom-bond-345614.CityBikes_CaseStudy.divvy_02_raw
UNION ALL
SELECT ride_id, rideable_type, started_at, ended_at, start_station_name, start_station_id, end_station_name, end_station_id, start_lat, start_lng, end_lat, end_lng, member_casual
FROM custom-bond-345614.CityBikes_CaseStudy.divvy_03_raw

Save Results as BigQuery table called WINTER_01-03_RAW

Population Size: 374952

Save Results as BigQuery table called SPRING_04-06_RAW

Population Size: 1598458

Save Results as BigQuery table called SUMMER_07-09_RAW

Population Size: 2382909

Save Results as BigQuery table called AUTUMN_10-12_RAW

Population Size: 1238744

Total Population size before cleaning 5,595,063

Looking for duplicates

SELECT DISTINCT ride_id,
FROM custom-bond-345614.CityBikes_CaseStudy.WINTER_01-03_RAW
Query Results no duplicates

Inspect a column with string data. (ride_id)

SELECT LENGTH(ride_id) AS Numbers_ride_id
FROM custom-bond-345614.CityBikes_CaseStudy.WINTER_01-03_RAW
Query Results = 16 characters long

SELECT ride_id
FROM custom-bond-345614.CityBikes_CaseStudy.WINTER_01-03_RAW
WHERE LENGTH(ride_id) > 16
There is no data to display.

SELECT ride_id
FROM custom-bond-345614.CityBikes_CaseStudy.WINTER_01-03_RAW
WHERE LENGTH(ride_id) < 16
There is no data to display.

Inspect another column with string data. (rideable_type)

SELECT LENGTH(rideable_type) AS Numbers,
FROM custom-bond-345614.CityBikes_CaseStudy.WINTER_01-03_RAW
Query Results range between 11 and 13

docked_bike =11 characters long
classic_bike =12 characters long
electric_bike =13 characters long

SELECT COUNT(rideable_type) AS Numbers,
FROM custom-bond-345614.CityBikes_CaseStudy.WINTER_01-03_RAW
WHERE LENGTH(rideable_type) =11
Query Results: 19034 = docked_bike

SELECT COUNT(rideable_type) AS Numbers,
FROM custom-bond-345614.CityBikes_CaseStudy.WINTER_01-03_RAW
WHERE LENGTH(rideable_type) =12
Query Results: 249257 = classic_bike

SELECT COUNT(rideable_type) AS Numbers,
FROM custom-bond-345614.CityBikes_CaseStudy.WINTER_01-03_RAW
WHERE LENGTH(rideable_type) =13
Query Results: 106661 = electric_bike

Check columns, start_station_name for null or missing values.

SELECT *
FROM custom-bond-345614.CityBikes_CaseStudy.WINTER_01-03_RAW
WHERE start_station_name IS NULL

Delete Null Values

DELETE FROM `custom-bond-345614.CityBikes_CaseStudy.WINTER_01-03_RAW.
WHERE start_station_name IS NULL

Query Results:
This statement removed 27519 rows from WINTER_01-03_RAW`
This statement removed 159,893 rows from SPRING_04-06_RAW.
This statement removed 268,834 rows from SUMMER_07-09_RAW.
This statement removed 234,563 rows from AUTUMN_10-12_RAW.

Check column, end_station_name for null or missing values.

SELECT *
FROM custom-bond-345614.CityBikes_CaseStudy.WINTER_01-03_RAW
WHERE end_station_name IS NULL

Delete Null Values where end_station_name is null or values missing.

DELETE FROM custom-bond-345614.CityBikes_CaseStudy.WINTER_01-03_RAW
WHERE end_station_name IS NULL

Query Results:
This statement removed 15,237 rows from WINTER_01-03_RAW.
This statement removed 80,586 rows from SPRING_04-06_RAW.
This statement removed 126195 rows from SUMMER_07-09_RAW.
This statement removed 93,934 rows from AUTUMN_10-12_RAW.

Percentage of NULL values removed and how many entries remain

SELECT CONCAT((328497/ COUNT(ride_id)) * 100, "%")
FROM custom-bond-345614.CityBikes_CaseStudy.WINTER_01-03_RAW

12.87% of the rows with null values have been removed from WINTER_01-03_RAW
42756 rows removed, 332196 entries remain in WINTER_01-03_RAW

17.70% of the rows with null values have been removed from SPRING_04-06_RAW
240479 rows removed, 1357979 entries remain in SPRING_04-06_RAW

19.87% of the rows with null values have been removed from SUMMER_07-09_RAW
395029 rows removed, 1987880 entries remain in SUMMER_07-09_RAW

36.08% of the rows with null values have been removed from AUTUMN_10-12_RAW
328497 rows removed, 910247 entries remain in AUTUMN_10-12_RAW

Query creates new column (ride_length), calculating the length of each ride in minutes a new

column (day_of_week), calculating the day of the ride. Month, calculating month of ride.

SELECT ride_id, rideable_type, started_at, ended_at,
ROUND(TIMESTAMP_DIFF(ended_at, started_at, second)/60, 1) AS Ride_Length,
EXTRACT(DAYOFWEEK FROM started_at) AS Day_of_Week,
EXTRACT(MONTH FROM started_at) AS Month,
start_station_name, start_station_id, end_station_name, end_station_id, start_lat, start_lng, end_lat, end_lng, member_casual,
FROM custom-bond-345614.CityBikes_CaseStudy.WINTER_01-03_RAW

Save query as a BigQuery table called WINTER_01-03_RAW_A,

Save query as a BigQuery table called SPRING_04-06_RAW_A

Save query as a BigQuery table called SUMMER_07-09_RAW_A

Save query as a BigQuery table called AUTUMN_10-12_RAW_A

How many rides are equal to or less than two minutes?

SELECT COUNT(*) AS Ride_Under_2min
FROM custom-bond-345614.CityBikes_CaseStudy.WINTER_01-03_RAW_A
WHERE Ride_Length <= 02.0

Delete the rides, equal to or less than two minutes

DELETE FROM custom-bond-345614.CityBikes_CaseStudy.WINTER_01-03_RAW_A
WHERE Ride_Length <= 02.0

Query Results:
This statement removed 8,050 rows from WINTER_01-03_RAW_A.
This statement removed 33129 rows from SPRING_04-06_RAW_A.
This statement removed 48,984 rows from SUMMER_07-09_RAW_A.
This statement removed 28,737 rows from AUTUMN_10-12_RAW_A.

How many rides are equal to or greater than twenty four hours?

SELECT COUNT(*) AS Rides_Over_24hrs
FROM custom-bond-345614.CityBikes_CaseStudy.WINTER_01-03_RAW_A
WHERE Ride_Length >= 1440

Delete the rides equal to or greater than twenty four hours.

DELETE FROM custom-bond-345614.CityBikes_CaseStudy.WINTER_01-03_RAW_A
WHERE Ride_Length = > 1440

Query Results:
This statement removed 129 rows from WINTER_01-03_RAW_A.
This statement removed 584 rows from SPRING_04-06_RAW_A.
This statement removed 403 rows from SUMMER_07-09_RAW_A.
This statement removed 159 rows from AUTUMN_10-12_RAW_A.

Percentage of ride length, equal to or more than 24hrs removed and how many entries remain

SELECT CONCAT((28896/ COUNT(Ride_Length)) * 100, "%"),
FROM custom-bond-345614.CityBikes_CaseStudy.WINTER_01-03_RAW_A

2.52% of the rows in ride length, equal to or more than 24hrs removed WINTER_01-03_RAW_A 8179 rows removed, 324017 entries remain WINTER_01-03_RAW_A

2.54% of the rows in ride length, equal to or more than 24hrs removed SPRING_04-06_RAW_A`
33713 rows removed, 1324266 entries remain SPRING_04-06_RAW_A.

2.54% of the rows in ride length, equal to or more than 24hrs removed SUMMER_07-09_RAW_A 49387 rows removed, 1938493 entries remain SUMMER_07-09_RAW_A

3.27% of the rows in ride length, equal to or more than 24hrs removed AUTUMN_10-12_RAW_A
28896 rows removed, 881351 entries remain AUTUMN_10-12_RAW_A

Sort data into chronological order

SELECT *
FROM custom-bond-345614.CityBikes_CaseStudy.WINTER_01-03_RAW_A
ORDER BY started_at

Save query as a BigQuery table called WINTER_01-03_CLEANED

Population Size: 324017

Save query as a BigQuery table called SPRING_04-06_CLEANED

Population Size: 1324266

Save query as a BigQuery table called SUMMER_07-09_CLEANED

Population Size: 1938493

Save query as a BigQuery table called AUTUMN_10-12_CLEANED

Population Size: 881351

Total Population size after cleaning 4,468,127

Changed day_of_week column using CASE function so numbers 1 to 7 will change to read Sunday to Saturday

(1 = Sunday, 2 = Monday, 3 = Tuesday, 4 = Wednesday, 5 = Thursday, 6 = Friday, 7 = Saturday).

Total numbers of rides for members and casual riders, for every hour of the day 0 hrs to 23 hrs.

SELECT CASE
WHEN day_of_week = 1 THEN 'Sunday'
WHEN day_of_week = 2 THEN 'Monday'
WHEN day_of_week = 3 THEN 'Tuesday'
WHEN day_of_week = 4 THEN 'Wednesday'
WHEN day_of_week = 5 THEN 'Thursday'
WHEN day_of_week = 6 THEN 'Friday'
WHEN day_of_week = 7 THEN 'Saturday' END AS day_of_the_week,
EXTRACT(hour FROM started_at) AS hour,
COUNT(*) AS Rides_Members
FROM custom-bond-345614.CityBikes_CaseStudy.AUTUMN_10-12_CLEANED
WHERE member_casual = 'member'
GROUP BY day_of_week, hour
ORDER BY day_of_week, hour

Population size for member and casual riders, WINTER_01-03_CLEANED

SELECT COUNT(member_casual) AS Member_Riders
FROM custom-bond-345614.CityBikes_CaseStudy.WINTER_01-03_CLEANED
WHERE member_casual = 'member'
Query Results: 226635

Save results as CSV(local file) renamed file to POP SIZE WINTER MEMBER

SELECT COUNT(member_casual) AS Casual_Riders
FROM custom-bond-345614.CityBikes_CaseStudy.WINTER_01-03_CLEANED
WHERE member_casual = 'casual'
Query Results: 97382
Save results as CSV(local file) renamed file to POP SIZE WINTER CASUAL

Average, minimum, maximum ride length in minutes for member riders

SELECT
AVG(Ride_Length) AS Member_Avg,
MIN(Ride_Length) AS Member_Min,
MAX(Ride_Length) AS Member_Max,
FROM custom-bond-345614.CityBikes_CaseStudy.WINTER_01-03_RAW_A
WHERE member_casual= 'member'

Save results as CSV(local file) renamed file to AVG_MIN_MAX_Member W

Average, minimum, maximum ride length in minutes for casual riders

SELECT
AVG(Ride_Length) AS Casual_Avg,
MIN(Ride_Length) AS Casual_Min,
MAX(Ride_Length) AS Casual_Max,
FROM custom-bond-345614.CityBikes_CaseStudy.WINTER_01-03_RAW_A
WHERE member_casual= 'casual'

Save results as CSV(local file) renamed file to AVG_MIN_MAX_Casual W

SELECT *
FROM custom-bond-345614.CityBikes_CaseStudy.AUTUMN_10-12_AVG_MIN_MAX_casual
UNION ALL
SELECT *
FROM custom-bond-345614.CityBikes_CaseStudy.SUMMER_07-09_AVG_MIN_MAX_casual
UNION ALL
SELECT *
FROM custom-bond-345614.CityBikes_CaseStudy.SPRING_04-06_AVG_MIN_MAX_casual
UNION ALL
SELECT *
FROM custom-bond-345614.CityBikes_CaseStudy.WINTER_01-03_AVG_MIN_MAX_casual
SELECT *
FROM custom-bond-345614.CityBikes_CaseStudy.AUTUMN_10-12_AVG_MIN_MAX_member
UNION ALL
SELECT *
FROM custom-bond-345614.CityBikes_CaseStudy.SUMMER_07-09_AVG_MIN_MAX_member
UNION ALL
SELECT *
FROM custom-bond-345614.CityBikes_CaseStudy.SPRING_04-06_AVG_MIN_MAX_member
UNION ALL
SELECT *
FROM custom-bond-345614.CityBikes_CaseStudy.WINTER_01-03_AVG_MIN_MAX_member

Total number of rides on each day of week for casual & member riders

SELECT day_of_week, COUNT(*) AS Total_Rides_Members
FROM custom-bond-345614.CityBikes_CaseStudy.AUTUMN_10-12_CLEANED
WHERE member_casual = 'member'
GROUP BY day_of_week
ORDER BY day_of_week

SELECT day_of_week, COUNT(*) AS Total_Rides_Casual
FROM custom-bond-345614.CityBikes_CaseStudy.AUTUMN_10-12_CLEANED
WHERE member_casual = 'casual'
GROUP BY day_of_week
ORDER BY day_of_week

Calculate the average ride length for each day of the week, for both members and casual riders.

SELECT day_of_week, AVG(ride_length) AS member_avg_ride
FROM custom-bond-345614.CityBikes_CaseStudy.AUTUMN_10-12_CLEANED
WHERE member_casual= 'member'
GROUP BY day_of_week
ORDER BY day_of_week

SELECT day_of_week, AVG(ride_length) AS casual_avg_ride_
FROM custom-bond-345614.CityBikes_CaseStudy.AUTUMN_10-12_CLEANED
WHERE member_casual= 'casual'
GROUP BY day_of_week
ORDER BY day_of_week

Rideable Type differences in the type of bikes chosen.

SELECT rideable_type,
COUNT(*) AS Member_Total_Rides
FROM custom-bond-345614.CityBikes_CaseStudy.AUTUMN_10-12_CLEANED
WHERE member_casual = 'member'
GROUP BY rideable_type

SELECT rideable_type,
COUNT(*) AS Casual_Total_Rides
FROM custom-bond-345614.CityBikes_CaseStudy.AUTUMN_10-12_CLEANED
WHERE member_casual = 'casual'
GROUP BY rideable_type

Ten most popular start stations for members and casual riders

SELECT start_station_name,
COUNT() AS Casual_Total_Rides
FROM custom-bond-345614.CityBikes_CaseStudy.AUTUMN_10-12_CLEANED
WHERE member_casual = 'casual'
GROUP BY start_station_name
ORDER BY COUNT(
) DESC LIMIT 10

Ten most popular end stations for members and casual riders

SELECT end_station_name,
COUNT() AS Member_Total_Rides
FROM custom-bond-345614.CityBikes_CaseStudy.AUTUMN_10-12_CLEANED
WHERE member_casual = 'member'
GROUP BY end_station_name
ORDER BY COUNT(
) DESC LIMIT 10

SELECT end_station_name,
COUNT() AS Member_Total_Rides
FROM custom-bond-345614.CityBikes_CaseStudy.AUTUMN_10-12_CLEANED
WHERE member_casual = 'casual'
GROUP BY end_station_name
ORDER BY COUNT(
) DESC LIMIT 10

Repeat above steps for SPRING, SUMMER, AUTUMN.

Use saved results, CSV(local file) with Tableau

This took way longer than ever expected and I do not know if this project is correct…

Constructive Criticism Welcome

R Studio version of this casestudy, can be found at....

https://rpubs.com/Stephen_Williams/1040064 or dashboard at https://sw2070.github.io/SW-LEARNING-MATERIALS/#january-to-march-2021

SW