Analysis of A Year of Divvy Bike Riders
Exploratory Analysis for December 2020 - November 2021 Historical Data
In the last year I moved from the very rural Poconos to the suburbs of Seattle, near where I grew up. A number of changes have occurred since I left the area, but some of the notable changes are related to the population increase, housing changes, and the new light-rail which is currently under construction. Traffic congestion has been a problem for as long as I can remember, but luckily, Seattle is home to a great public transportation system, an established multi-use network of trails and bridges, and marked traffic lanes for bikes on many roadways. With moderate, albeit drizzly weather year-round, getting around the city using a combination of walking, biking and public transportation can be affordable, pleasant and a great way to get some needed exercise.
Having so many options for getting around in Seattle, many families are able to voluntarily downsize to a single car. For some, however, the downsizing is not by choice. As the cost of living continues to rise due to economic pressures many individuals and families are forced to cut costs wherever they can. The cost of owning a car can be as low as $6,496(“Living Wage Calculator - Living Wage Calculation for Seattle-Tacoma-Bellevue, WA” n.d.) to well over $11,000 a year, versus owning a bike at a modest cost of only $100 to $400 per year, depending on the estimate. We can assume for sake of simplification that we will go with $350, and add this to the most expensive Regional Pass on an ORCA card, costing $360 a month (although the $99 or $108 per month passes or more common)(“ORCA: Home” n.d.), which comes out to $4670 per year when using public transportation in addition to a bike for a daily commute. Admittedly, Seattle has hills, and some people do not love the rain as much as I do, so this is where an ebike could be useful. Again, you’d need to consider a slight cost increase, but once the bike investment has been made, it is largely the same as a classic petal bike for maintenance, only needing additional costs for replacing batteries every few years and regular charging (“Electric Bike Maintenance Cost” n.d.), which is minimal and dependant of current energy costs. Owning any bike, however, may not be an option to everyone, particularly if living in one of the many small apartments around the area which do not offer safe storage options for bikes.
In many Seattle neighborhoods, bright orange Lime bikes have been scattered throughout and parked on the edge of the sidewalk or in parking lots, offering ebike options to the community. Lime bikes is a local Bike-share company owned and operated by Uber which offers $1 unlocked fees and per minute charges for ebikes and scooters. Other companies in Seattle include Jump and Veo, but all of these companies are independently owned and operated (“Bike Share - Transportation Seattle.gov” 2021). Recently these bikes have been disappearing, after for what seemed like months of sitting around, and looking neglected. It appeared to me that these bikes never moved, seeing the same graffiti tagged bikes repeatedly on my runs through the neighborhood. Now, they appear to being replaced by newer green, black and white electric scooters. Scooters are clearly more fun, but why this change? Maybe there is something in their ridership data that shows them what riders are preferring. Perhaps they have data identifying trends in bike usage that I don’t have access to, which would indicate that the usage is higher than I thought.
Thus far, I have been unable to find any historical open data from these companies. While completing my Google Data Analytics Professional Certification I was exposed to Divvy Bikes, a Chicago based bike-share company, which does offers historical open data that can be used to do an exploratory analysis on how a ride-share company might be used in this neighborhood. I should be clear however, this analysis has nothing to do with Seattle. I am looking at Chicago data to understand how bike-share companies might be offering affordable transportation options, and how they might be used by riders. Divvy data collected from December 2020 through November 2021 contains data related to their two rider types: annual membership riders and non-membership rider, the last we will call Casual Riders. What this data shows is that annual membership holders use the bikes to go about regular daily activities(commuting, exercise, work or personal errands, etc), whereas non-membership or casual riders are likely using the bikes for sight-seeing and recreation.
Divvy Bikes - The Company
Divvy is a bike share system, started in 2013 and owned by the Chicago Department of Transportation. Staring in 2019, Lyft took over the management, while Chicago Department of Transportation retained the ownership. The system has been expanded and now spans the Evanstan and Chicago areas, with pricing options that can be used by both visitor and resident riders. The bikes include classic bikes and e-bikes to accommodate different fitness levels which are checked out and returned to any of Divvy bikes network of docking stations. At the time of this report, there were 836 stations with a fleet of up to 16,500 total bikes with ~7,000 of those being e-bikes(“Divvy Kicks Off West Side Expansion with 3,500 Bikes, 107 ‘e-Station’ Parking Areas. Streetsblog Chicago” 2021a).
Pricing options include:
Single Ride for up to 30 minutes priced at $3.30
Day Pass for unlimited 3 hour rides within 24 hours for $15
Annual Membership for unlimited 45 minute rides priced at $108 for the year.
Single Riders and Day Pass Riders are what I will consider as Casual Riders, as this is consistent with the terms used by Divvy Bikes to refer to the non membership riders.
Additional fees are added if:
Bike is out longer than allowed time limit: $0.15/minute
Upgrading to an ebike: $0.20/minute for Zone 1 for Casual Riders and $0.15/minute for Membership riders, however there is no upgrade fee for Zone 2.
Bike is lost or stolen: $1200
Parking violations: $25
Investigation Problem
This exploratory analysis will focus on how riders use the bikes. This question can be answered by looking at the following three questions:
- Where do they get their bikes?
- What routes are they traveling?
- When are they riding?
The Data
The Divvy bikes open historical data is provided at the website https://divvy-tripdata.s3.amazonaws.com/index.html under the Divvy Data License Agreement. In this analysis I look at the historical data from the last 12 months which include data collected from December of 2020 through November 2021, although data is available beginning in 2013 and appears to update monthly.
This dataset include trips that are anonymized to include:
- Ride identifiers (ride_id)
- Bike type used (rideable_type)
- Trip start date and time (started_at)
- Trip end date and time (ended_at)
- Trip start station (start_station_name), with stations identifiers (start_station_id) and locations in latitude and longitude (start_lat, start_lng)
- Trip end station (end_station_name), with stations identifiers (end_station_id) and locations in latitude and longitude (end_lat, end_lng)
Data is expected to be pre-processed to remove trips taken by staff as they service or inspect the system, as well as removing any trips under 60 seconds due to potentially false starts. See here for details provided by Divvy. Cleaning of this data was completed to remain consistent with what was stated.
Open data for the station names for verification of stations is provided by the Chicago Data Portal Website here. This data tracked historical data for all bicycles and bike docks, and has 8 variables which include:
- Station IDs (ID)
- Station Name
- Total Docks
- Docks in Service
- Status
- Latitude
- Longitude
- Location (combined latitude and longitude)
Cleaning the Data
Normal Cleaning processes were completed before the analysis took place. Cleaning steps were are follows:
Checking the structure of the individual files and merging into a single data frame.
Checking for duplicates, staff rides and ride duration that were negative or less than one minute, and removing these from the data frame.
Correcting obviously incorrect Station names. I did not, however, remove rides without a stations name. It is clear from the website discussing fines that occasionally users return bikes without docking with a station. My assumption is that another rider would be able check out an un-docked bike near a station without issue.
Aggregating data required the use of variables that were not provided, so the following data variables were added to the dataset:
date (yyyy-mm-dd): this will be the date at which the bike was initially checked out, extracted from started_at, then broken down into separate columns for:
- month
- day
- year
- day_of_week: a calculated field based on the actual date.
ride_length_sec (in seconds): a calculated field based on the started_at and ended_at dates.
ride_length_min (in minutes): a calculated field based on the started_at and ended_at dates.
- Some stations had to be removed as they were not consistent with the stations data provided by the Chicago Department of Transportation data. My assumption was that these rides were completed staff members, so the rides containing start or stop locations at the following stations where removed.
- Base - 2132 W Hubbard Warehouse
- Chicago Ave & Dempster St (listed as a future site not yet operating at the time of this analysis)
- DIVVY CASSETTE REPAIR MOBILE STATION
- HUBBARD ST BIKE CHECKING (LBS-WH-TEST)
- Lyft Driver Center Private Rack
- WEST CHI-WATSON
There were two other stations that were not valid operating stations according to my dataset on station data.
The Marshfield Ave & Cortland St station: I could not determine if this was temporarily or permanently closed. As it has been a valid station for the last 7 years, all riders for this station were kept.
The Throop/Hastings Mobile Station: located on the opposite side of Fosco Park from another station and. I could not find anything confirming it as an operational station. so I assumed this to be a temporary station. I relabeled all start and end stations that used this station using the closest station at the Racine Ave & 13th St Station, also located at Fosco Park.
This final data frame contains 4,466,054 observations and 20 variables, which was then exported as a CSV file for analysis in both R and Tableau Public.
The Analysis
Which Stations are Popular?
To answer this first question, I looked at how many rides started at every station. In the first figure below are two charts, on the left is a density plot of all casual rides, on the right is the same for the membership rides. Darker stations have fewer number of trips, whereas the brighter red increases into yellow, which corresponds to the highest number of trips starting at those stations. Notice that the casual rides have a strong bright yellow spot on the waterfront, particularly at the Navy Pier which is a popular tourist attraction, yet less intense everywhere else. Membership riders have bright spots at multiple locations around town. Clearly there is emerging a pattern that separates the two groups, but more detail is needed. In the next chart I focus in on the top 10 stations for each group to better understand what might be driving the higher popularity of some stations.
A surprising detail is shown in the Top 10 Popular Stations chart for Causal riders. I created 15 subdivisions, as 10 was not enough to see a difference between the top stations numbers 2-10. The Streeter Dr & Grand Ave station, located at the Navy Pier, was used significantly more than any of the other stations for casual riders with 63,033 rides, whereas the second most popular station had almost half that number, at 33,874 rides. I would like to note here, that casual rides make up 45% of all rides, and the Streeter Dr & Grand Ave Station make up 3% of those. In the chart showing membership rides, it was sufficient to see a variation between them with only 10 divisions, as the number of trips range from 15,744 at the lowest of the 10 to 23,336 for the highest. Focusing on just these 10 stations for each group, I used maps to determine what features and amenities might be in the area, generally the streets immediately around the station. There were three stations in common between these two groups. These three stations are:
DuSable Lake Shore Dr & North Blvd Station: This station is located at Lincoln Park major intersection, near the water, parking, local ball fields, many other recreational and non recreational locations.
Wells St & Concord Ln: This station is located near shopping, hotels, apartments, and near Lincoln park.
Wells St & Elm St: Also near restaurants and apartments. However, it is near the Walter Payton College Preparatory High School and the Clark Davidson Subway station. These last two might be more important to membership riders, lending to its higher popularity with this group.
Even considering these common stations, it becomes clear that casual and membership rider patterns show that casual riders favor tourist and recreational sites, membership riders heavily favor sites with parking, shopping, apartment buildings and schools.
Which Routes are Popular?
Routes were determined by plotting start stations versus the end stations, then focusing on popular routes. For casual riders I filtered routes which had less than 800 rides. With only these routes, there are two clear patterns emerging. The first, more dominate pattern shows rides favorably start and end at the same location. The second pattern shows that the second most favorable pattern travels from one highly popular recreational site to another highly popular recreational sites, as higher popularity routes coincide with the popularity of the start stations. Both of these patterns show typical behavior expected of sight-seeing and tourism, indicating that casual riders may often be non-residents guests to the city.
In this second chart, I look at the popular routes of membership rides, filtering out routes that had been traveled less than 600 times. Unlike casual riders, the top 10 popular start stations do not indicate the popularity of the route. The pronounced pattern here shows the most popular routes are one way short distance, where the return trip is close in popularity, and located away from recreational sites. Recreational routes do show that membership riders are also popularly completing trips starting at and ending at the same local park stations. These patterns would show that membership riders are using the bikes to commute to and from work, run errands around town, or engage in exercise or other social activities, consistent with what a local resident might use a personal bike for.
When do the Ride?
The following charts show views related to trips taken with respect to different time markers. The top chart looks at the trips over the entire year. The trend for both types of riders increases through the warmer months before dropping again as weather cools, but there are a couple of factors that might contribute to this. Last winter COVID trends were on the rise (same as this year), however the vaccine had not been made available to the population. The other fact is that snow, ice and sleet are common in the Chicago area. This is a trend I am not likely to see in Seattle, but it is important to identify here as the data is from a source where this sort of weather event can inform how someone might ride a bike. Honestly, I still laugh at pictures of 2008 “Snowpocalypse” re-posted during this year’s Seattle winter storm. While living in PA I routinely had snow and ice that would fall in October or November, then last all winter as “Nor’easter” storms would continually add to the un-melted portion of packed ice and persist until March or April in the spring. Another observation I found from the data, is that the peaks in the number of trips for casual riders and membership riders were following different patterns. So I considered the number of trips categorized by day of the week and month to get more details.
When categorized by month, casual rides peaked in July and August, and remained high through the summer months, which corresponding with common vacation times. Membership riders were also higher in warmer months, but stayed active during the cooler fall as students returned back to school. Riders by the Day of the Week chart shows that Casual riders had a preference for weekend trips, while membership riders favored the weekdays and peaked during midweek.
The final set of charts considered the Number of Riders by Hour for Weekends and Weekdays separately as it is clear from the previous charts that behavior for each group have different trends over time. When considering the Weekdays, there is a clear trend with peaks during commute times before and after school and work. Higher peaks in the evening commute which has higher numbers consistent with afternoon and evening activities. Casual rides rose quickly in two places, after breakfast and after lunch. It should be noted that there appears to be a potential underlying pattern that may indicate rides made by non-member locals as it may be following similar trends to membership riders. Weekend times however are about the same for both groups, who would possibly both be going about recreational activities and exercise.
Final Conclusions
In this analysis I set out to investigate how the riders used Divvy bike-share bikes, to better understand the use of bike-share companies use in city neighborhoods. By considering both casual riders and membership riders, I analyzed only three questions which could ultimately be answered based on the data. These three questions asked about where riders were riding, what routes they were taking, and when they were riding and led to the following conclusions:
Casual riders made up 45% of riders, and were likely doing casual sight-seeing trips. They commonly favored weekends and summer at and around the Navy Pier. Many of these riders were likely performed by non-resident guests of the city who could benefit from short term payment options.
Membership riders made up 55% of riders, and were likely commuting or carrying out their daily activities and exercise when the weather permitted safe use and travel.
Seattle, like Chicago, has many recreational and tourist sites which attracts both tourism and locals. The pandemic forces us to mask up while going to the gym, which can be uncomfortable and too much for many people. Outdoor activities are a great way to keep active. Seeing how Divvy bikes in Chicago were used over the last year for exercising by both casual riders and membership riders, shows that bike-share companies offer a great, affordable and safe option for transportation, recreation and exercise.
Other Questions
As an outsider to Chicago, I love that these bikes are widely available to the community, and I can see how it is being used in Chicago. I can see how this sort of system can be applied to other cities like Seattle. This analysis has helped me to truly understand how bike-share systems could be useful in my local community, despite my inability to obtain local data. There were some other questions I would have liked to investigate that might have added to my general understanding, however the data needed to understand those questions was either missing or very difficult to extract from the available data. Additionally there were a few things that alluded to some potential issues with the Divvy system.
1. While investigating the station locations, I referred often to Google maps in order to look at the street level images. Images and negative reviews showed me that there may be an issue with stocking of classic bikes near resident locations that would convince locals to purchase membership passes. In another analysis, it might be helpful to do a more detailed station evaluation to understand how stocked, or under stocked, a station might be. This was not part of the question I was asking, but would be an interesting investigation focusing on the Membership rider. This analysis could even go further to get more detailed data on what amenities are nearby or how the demographics are distributed around the stations.
2. As part of this analysis I considered views from every variable in my final exported dataset. While analyzing trip duration, I became convinced that inexperienced riders venturing for the first time, may not understand the locking mechanisms, or the requirement to lock the bikes at an appropriate Divvy station. First, February has the lowest number of trips taken by both groups, likely due to weather in the area, however there were the highest number of outliers when considering the trip duration. There where multiple casual rides which lasted for more than 3 weeks. However the largest outliers for membership riders was about a day. I feel it is unlikely that any ride actually lasted that long, making it clear that there may be an issue with locking mechanism just failing. Particularly, since Divvy charges $1200 for a lost or stolen bike after only 24 hours.
3. When analyzing routes, I found that latitude and longitude coordinates for routes that started and ended in the same station did not actually occur in the same point. it is expected there was some small variations, but at times there were trips that went completely across the park and were considered the same station. When comparing this with bike types, which should have only had classic and electronic bike options, also had a docked bike category. I could not track down what a docked bike was, so ultimately I assumed that some bikes where not properly docked. In reading about the expansions “Divvy Kicks Off West Side Expansion with 3,500 Bikes, 107 ‘e-Station’ Parking Areas. Streetsblog Chicago” (2021b), from a comment on this articleit is made clear that there are separate docking systems for classic bikes vs ebikes. Inexperienced riders may not truly understand the system, or that they need to be docked at the correct station. The website would lead me to believe that any Divvy station would work.