Exercise 6: APIs

My map presents the spatial location of more than 16,000 tweets pertaining to Uber in the New York City area, overlaid on the median household income for each Census tract in the city.

I have worked with the Census API before. Last fall, I wrote a powerful and flexible python script that read in a dictionary of variables from a CSV, then queried the Census API for those variables for specified geographies of interest. Given the narrower scope of today’s task, however, I chose to use the preexisting census library from Sunlight Labs. After consulting the Social Explorer data dictionary, I was easily able to pull data on median household income (B19013) for all Census tracts in the five counties of New York City.

In CartoDB, I visualized the median household income data using a choropleth map based on a shoreline-clipped Census tract shapefile I obtained from Bytes of the Big Apple. In order to join my Census data to the shapefile (which did not come with standard FIPS codes), I needed to create an unusual join field consisting of one digit for borough and six digits for Census tract.

I used the Twitter API to collect nearly 25,000 tweets pertaining to Uber within a rectangular bounding box containing the five boroughs of New York City, plus other nearby areas. I collected data during the prime leisure-Uber usage window: 7 pm Saturday until 3 pm Sunday (bars and brunches). Because I am only interested in NYC, I then used a spatial selection in ArcMap to limit my dataset to the more than 16,000 tweets geotagged within the five boroughs.

My hypothesis in conducting this analysis was that areas of greater median household income would see more frequent mention of Uber in nearby tweets – that is, that a data analysis would help substantiate the idea that Uber is more prevalent among higher-income individuals. In a general sense, my map does indeed demonstrate this spatial correlation.

However, my analysis is not without flaws, some serious:

  • I do not normalize my Uber tweet data. That is, perhaps Uber is equally popular throughout New York, but Twitter itself is more frequently used in higher-income areas. A better approach might be to capture all geotagged tweets and calculate the proportion that mention Uber within a given area.
  • The map faces some strenuous visualization challenges. In particular, I struggled to find an appropriate collection of settings that would make both income data (choropleth) and tweet locations (intensity) simultaneously comprehensible. It is possible that the data would be better presented in two or more maps adjacent to one another, though I think my visualization functions fairly well at at least some zoom levels.
  • I had no good way of discerning the way in which Uber was mentioned in a given tweet. Some tweets were directed at @uber, some used the hashtag #uber, some used the word in passing, and some appeared not to contain the term at all – rather, they linked to websites that did.

Nevertheless, as both a proof of concept of working with the Census and Twitter APIs, as well as a graphical representation of Uber-mania in New York, I believe my map is a substantive success.