The California Highway Patrol collects data from local police departments about all serious vehicular collisions. I downloaded their 2014 data, and filtered for incidents that involved bicycles. Since most incidents did not include geocoordinates, I used Google's Geocoding API to find the latitude and longitude of each crash (see below for the limitations of this approach).
Data on the population size and number of bike commuters in each city and county come from the US Census Bureau. I used their API to get 2013 American Community Survey 5-year estimates for these values (tables B01003 and B08301, respectively).
I identified "Danger Zones" by finding areas where at least 4 crashes had occured within 250 meters of each other. Thanks to Mindy Huang for suggesting and helping me design a recursive algoritm to find these zones.
Since not all collisions involve a police response or are reported to the CHP, this app does not include every bike crash in California. Still, it should contain enough data to provide a decent picture of where crashes typically occur.
Due to issues with how the locations of each crash were formatted, the Geocoding API struggled to accurately locate some of incidents. I've tried to correct obviously incorrect locations, but didn't have the time to manually review all 13,000 incidents. Therefore, the locations shown on the map may not always be correct — refer to the text for the most accurate description of the location.
The map shows the intersection that was used as a location reference in the raw data. Many collisions, however, did not actually occur at at intersection, but rather 200 feet west of it (for example). Again, refer to the text for the most accurate description of the location.
The Census data only counts people who commute to work via bike, so may not be an accurate representation of how many people regularly bike in a given community. Therefore, treat the "collisions per 1,000 bike commuters" as just a rough indication of the ratio of collisions to total bikers.
Both the 4 crashes and 250 meter cutoffs were chosen arbitrarily, and may not be the best way to identify particularly dangerous locations. Moreover, my approach only flags incidents that cluster in a circle, rather than along a stretch of road.
- Data cleaning and analysis was conducted in Python, with heavy use of the Pandas library.
- The web app is powered by Flask, a Python micro-framework.
- The Frozen Flask library generates static webpages that load faster (since they don't need to query the massive incidents database).
- The maps are created using Leaflet.js, with custom map tiles designed in Mapbox Studio.
- The charts are generated using Chartist.js.