top of page

Cyclistic Capstone Project

This project utilizes R-programming to analyze Cyclistic and make insightful recommendations to the company owner.

Table of Contents

Introduction
Introduction
Data
Analysis
Recap
Action

ICyclistic Capstone Concluded!

​

Although I've never had the chance to use the service, I clearly remember the first time I saw a bike-for-hire service in my city. My first thoughts was "What is that bike doing there? I hope nobody steals it!" Then, my Mom (who reads the paper because she's a responsible adult 😜) explained to me that it was a new business model. And I was fascinated! I'm always impressed by entrepreneurs, the idea to turn a simple concept into a money making opportunity is really admirable.

​

So when I had the chance to explore a dataset for a bike rental company through my capstone project with the Google Data Analytics certificate, I jumped at the opportunity to identify how annual members and casual riders utilize Cyclistic bikes differently to find ways to convert casual users to members.

​

Using RStudio, I imported the 12 csv data files representing September 2021 through August 2022  and discovered:

 

  • Casual riders rent more frequently and ride farther, on average, than do members

  • Late afternoons, weekends, and warmer months are the most popular rental times for riders

  • The Streeter Drive and Grand Avenue start station rents nearly twice as many bikes as any of the other 10 most in-demand start stations

format start date.png
Day of the week.png

First, the Data!

​

This dataset comes from the Google Data Analytics Certificate Capstone Project.

​

The link to the dataset is: Divvi Trip Data. I utilized the data files for September 2021 through August 2022 which used the following naming convention: YYYYMM-divvy-tripdata.

 

Of the thirteen columns, some of the most important are the: customer id, bike type, started at date and time, ended at date and time, start station name, end station name, and member type (member of casual). Since the size of the files varies from 103,771 rows to 823,489 rows, I decided to utilize RStudio to perform the data cleaning and analysis.

​

To begin, I loaded the following packages: tidyverse, lubridate, janitor, scales, readr and forecast. Next, I uploaded the 12 csv files, combined them into a single data frame, used the janitor package to clean the rows and columns and eliminated all columns except the necessary ones: (rideable_type, started_at, ended_at, start_station_name, member_casual). Additionally, I eliminated all rows with an empty start station because my analysis was aimed at determining the most popular start station. 

​

Next, I started working with the start and end dates to get them in the correct format to utilize for calculating ride lengths.

​

​

​

​

​

​

​

​

​

​

Once this was accomplished, I was able to calculate trip duration. At this point, some data cleaning was needed to eliminate rides that were less than a minute (574 rides, in total). 

​

​

​

​

​

​

​

​

​

​

​

All of the details of the code I wrote for this project, including the graphs, can be found in my RMarkdown file.

pie chart.png

The Analysis

Since the main task is to understand how casual and member riders differ, in hopes of converting more casual members to members, I thought that understanding the rider base to be a solid first step.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Next, I wondered if the riding habits of casual and members differed. As it turns out, members riders rent the Cyclistic bikes more often, on average, throughout the year.

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

 

 

Discovering this made me curious about the trip duration each type of rider typically goes. This time, it was the casual riders who consistently rode longer than member riders.

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

 

 

I then turned my attention to looking for trends at the daily, weekly and yearly level. And I discovered that 3pm-6pm were peak ride times.

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

 

 

While weekdays were more popular for casual riders, members rode more during the week.

​

​

​

​

​

​

​

​

​

​

​

​

​

​

 

 

 

And seasonally, the warmer months of June through October are the busiest.

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

 

 

 

Finally, I wanted to analyze which start stations were the most popular for marketing & promotional purposes and discovered the top 10 start stations.

​

​

​

​

daily.png
Weekdays.png
seasonally.png
Start stations.png
Member riders.png
longer rides.png

Recap & Recommend

​

The initial task was to investigate opportunities to convert casual riders to members, using data analysis techniques in R programming and it was discovered that:

 

  • In general, Members rent more frequently, but Casual riders stay on the bikes for longer

  • The busiest rental periods for riders are in the late afternoon, on weekends, and throughout the warmer months

  • Compared to the other ten start stations with the highest demand, the Streeter Drive and Grand Avenue start station rents almost twice as many bikes

​​

Based on this information, these are my recommendations to increase membership riders:

  • The Streeter Drive and Grand Avenue start station should be the focus of the marketing campaign, as it is where most Casual riders begin

  • Increase marketing to Casual riders when they ride the most: during the late afternoons, weekends, and warmer months

  • Late winter/early spring discounts could encourage Casual riders to convert to Members, especially if they receive a summary of their frequency and distance usage from the prior summer

​​

Here is where you can view this information presented in a PowerPoint format: Cyclistic

Action!

​

I thank you for reading and welcome your feedback! Please consider following me or connecting on LinkedIn at Carly Jocson. And keep me in mind for any remote positions as a data analyst!

bottom of page