Exercise 2: CartoDB

My goal in this assignment was to familiarize myself with the resources available through the NYC Open Data portal, as well as explore CartoDB’s mapping tools. As an “avid cyclist” myself, and a former resident of Brooklyn, I knew I wanted to look at bicycling in New York City.

My first instinct was to examine road conditions using the 311 Service Requests dataset. One classic stereotype of New Yorkers is that they love to complain, and at first glance I found this cliche to be validated by the data: even constraining my data request to only those 311 complaints pertaining to street condition, I found more than 566,000 records since the beginning of 2010. That’s a lot of complaints! Viewed differently, however, maybe that’s just a lot of people: there were only 0.012 street condition service requests per person per year.

The large majority (60%) of NYC 311 street condition complaints were requests for pothole repairs. (Pothole repair is a major NYC concern, and the NYC DOT’s pothole squad even has its own Tumblr.) My intuition was that a map of street condition service requests would usefully proxy streets’ pavement quality, adjusted by use (more people using the street = more people to complain about deficiencies).

Unfortunately, I was stopped in this line of analysis by CartoDB’s limitations. First, the whole 311 dataset was much too large to fit within the service’s free account quotas. Even after I limited the date range to the first half of 2013, CartoDB rejected my dataset and gave me no next steps to try. I still have the raw data, so may attempt to work with it in the future.

What I ended up doing was considering the relationship between bicycling and public transit. Bicycling can potentially be a useful solution to the last-mile problem of transit (efficiency dictates that the number of stops be limited, but then distance from stop to final destination may be high, and walking is relatively slow).

I used datasets for subway stations and bike racks from NYC Open Data. With these datasets, the process was simple – the website allowed me to download zipped ESRI shapefiles, which I was then able to upload to CartoDB without any trouble.

New York’s 469 subway stations will be immediately familiar to many observers, but its many CityRacks (17,680 in number as of June 2013) are less familiar. This Department of Transportation program allows individuals and businesses to request the installation of bike racks anywhere in the city. Thus, we can reasonably expect that the location of CityRacks will roughly correspond to those places where lots of people who are aware of City resources and processes want to bike. (The dataset does not, however, include the innumerable fences, sign posts, and parking meters that also serve as bike parking in New York.)

My first map displays subway stations and bike racks as points and bubbles.

I chose complementary colors (blue and orange) to represent subway stations and bike racks respectively. Although one could conceivably symbologize subway stations according to average daily boardings, I chose not to, both because those figures weren’t in the dataset I downloaded and because such a symbology would only further heighten the map’s emphasis on the central business district, where transit ridership is highest. I did, however, apply a light bubble symbology to the bike rack data, because each point represents anywhere from one to several dozen bike racks (say, a bike corral). Because the large majority of points correspond to one or two racks, I kept the bubble size gradations limited so as not to attract excessive attention to the few larger installations.

This map suggests that CityRack installations are generally relatively close to subway stations, but that they are much more prevalent in certain areas than in others. In particular, they appear denser in Manhattan and parts of northern and downtown Brooklyn.

My next map attempts to better display the distribution of CityRacks using a density visualization.

Viewing this map zoomed out shows how the density of bike racks diminishes with distance from the city center, and viewing it while zoomed in highlights the particular density of racks in Manhattan and in Brooklyn north of Prospect Park. We can also see how the bike rack density cells seem to emanate from transit stations. In this way, the density visualization helps confirm our anecdotal observations from the point/bubble map.

These maps may not be the perfect analytical tool. One key problem lies in determining whether the density of CityRacks is a direct function of proximity to transit, or whether some other variables are mediating the relationship. Population density itself comes to mind (more people live near the subways, because transit is attractive), as does affluence and the likelihood of working within a bike-able distance.

Further analysis is warranted. In particular, I am interested in examining the relationship between bike rack placement and sociodemographic variables including race/ethnicity, median household income, and time at current address.

Excelsior!

I hope CP255 will be an ideal environment for me to accumulate digital skills and build shiny example projects. I’m also hoping to draw on some of these skills in near-real time to build public-facing outreach and engagement tools for my PR/CR.