Renting and Commuting in the Bay Area: A Final Project

By: Nancy Stetson and Cassandra Bayer

Question: The Effect of Commute Times on Rental Prices

The term “housing crisis” has become commonplace amongst Bay Area residents. The influx of businesses into the region has driven up rental prices, squeezed the market’s available housing stock, and increased the population. Our initial hypothesis posited that the increased cost of living has driven people employed in city centers (i.e. San Francisco) to live in more suburban areas (i.e. Oakland) to find less costly housing.

Our questions investigate how the ease of commuting affects rental prices, whether any savings on rent are eroded by longer commutes, and if areas exist in the Bay Area that are both low-rent and low-commute.

Background: Trends Over Time

Between 1980 and 2010, the housing market (on a national scale) rose and fell. San Francisco stands out in the pack even when pitted against comparable cities. The graph below shows rental affordability, or the percent of the median income that would be spent on the median rent, in Atlanta, Chicago, New York City, and San Francisco. Highlighted in red, San Francisco has largely mirrored trends until 2010. After this period, the rate of growth for the index has shot up drastically in comparison to other cities. In recent years, San Francisco has leapt above NYC to become the most expensive place to live in the United States.

Source: Zillow Data, Mortgage Affordability, Rental Affordability, Price-to-Income Ratio through Q3 2016

To understand where residents of the Bay Area are choosing to live, data must be disaggregated at a localized level. For example, as exhibited in the chart below, rents vary significantly across the Bay Area. Vallejo and San Francisco are only 31 miles apart, but Vallejo is far more affordable than San Francisco.

Source: Zillow Data

Our hypothesis is then that this rent disparity is created by high costs of commuting. If the commute from Vallejo to the economic center of the region is significant, then lower rents may still not be worth it, as the commuting costs erode rent savings.


Our general plan for this project was to combine the cost of renting with the cost of commuting, so we could better explore the tradeoffs made by commuters in the Bay Area.


We use three main sources of data. The first is a dataset of Craigslist postings collected by the Urban Analytics Lab in 2014 and were provided by Geoff Boeing. We combined these listings with commute data, using a simulated dataset of commute destinations from the Metropolitan Transportation Committee. To find commute times, we made requests from the Google Directions API.

The MTC data used Traffic Analysis Zones (TAZ). TAZ are constructed by census block information; typically, they capture important information like the number of cars per household, income, and employment within each zone. In the Bay Area there are approximately 1,500 zones, they are aggregated into 34 “Super Districts.” Much of our analysis was centered around these zones. We found average commute times for each zone, and these commute times are based on commutes from zones to common super district destinations.

The following map displays geography of the MTC dataset.

The bulk of this project involved organizing and aggregating the commute destinations into a meaningful dataset and applying those findings to the rent data. There were two resulting datasets: one which had observations for each TAZ with average commutes and median rents, and another which had observations of rental listings that included the same information. The first dataset was used to map which areas are more or less expensive to live when both rent and commutes are taken into account. The second was used to regress commutes on rents to assess whether there is a cost of commuting that is revealed through the variation of rents in varying levels of accessibility to economic centers.

While we relied heavily on Python and Carto for this project, we used a blend of tools manipulate, visualize, and analyze the data.


Visuals Data Cleaning


Python, version 2.7, Anaconda distribution




Jupyter Notebooks X X X
R, version 3.3.1 X
Rstudio  X X X



The central challenge for the project was finding commute times which accounted for traffic. We wanted to be sure to include traffic in our analysis, since it can be a significant barrier to transportation. For example, it is not uncommon for commute times across the Bay Bridge to double or triple during peak hours. In order to find commute times in the Bay Area, we used a Google API service which connects to Google’s map directions which predicts traffic at certain times and days.

The Google service allowed us to find the travel times for both driving and public transportation from pairs of locations. Our goal was to approximate the average commute time for a given Craigslist rental listing. To do this, we found where people were commuting to based on MTC data which recorded both their home transportation access zone and their destination zone. Due to the limits of the Google API, we aggregated destinations into their super districts, which are larger areas used by MTC to estimate commutes. For each home zone, we found times to the centroids(1) of super districts that were popular commute destination for each respective zone.

The following map illustrates the method we used to find average commute times. The map shows a single zone’s commute routes from West Berkeley to popular destinations. The color of the lines denotes the popularity of the route. Again, due to the limits of the Google API, we restricted our analysis to routes with at least 20 commuters in the simulated MTC data.

After we found the destinations of commuters, we used the Google API to find commute times for those respective start and end locations. To account for peak traffic, we set the time of arrival to 9 am on the Wednesday morning of December 14, 2016. The API has settings which allow you to control the mode of transportation, as well as the traffic model used for the driving directions. For each route we collected times using both the optimistic and pessimistic driving times, as well as the time a route would take on public transit. We took the average of the optimistic and pessimistic driving times to construct an overall driving(2) time.

The distribution of average commutes, for both driving and transit, are shown below. The average driving commute is around 20 minutes, while the average transit commute is much higher, at about an hour.



To create an overall average commute time for each zone, we combined the driving and transit times by using a rough estimate of 25% transit and 75% driving, based on a PPIC survey of the Bay Area. Our analysis did not account for variance in the popularity of transit across the Bay Area, which is likely significant, but we felt our estimate was a reasonable simplification.

Finally, we wanted to be able to compare the cost of rent to the cost of commuting. To do that, we had to transform the time it takes to commute into the cost of that time. We decided to estimate the cost of commuting based on the time cost, assuming that for every hour that a person spends commuting is an hour where they could instead be doing productive work, and are therefore missing out on those earnings. Although there are other costs associated with commuting, including gas and car maintenance or public transit fares, the time cost would be immediately transferrable from zone to zone based on the estimated commute times. We based the cost of a person’s time on the median earnings of an individual in the Bay Area. We estimate a time cost of commuting of approximately $20 per additional one way minute per month.

We then visualized our findings in using Carto. We did further analysis to find the cost of commuting intrinsic in the geographic and price variation in rental listings.


Our initial findings were surprising to us: the most populated commute patterns were within one’s own home TAZ. This means that, on average, people are living closer to work than we initially hypothesized. Initially, we thought most commuters were commuting to downtown San Francisco, but the commute data made clear that the most common commutes were relatively short. However, it would be interesting to explore who is making the longest commutes, and how the length of commute is related to income levels (for example: do people commute far because they are unable to afford housing near economic centers, or do they commute long distances because they prefer the amenities in one area over the other, and can afford a car and the time it takes them to reach a well paid job?)

The pattern of shorter commutes can be seen in the first map below, which plots our constructed time cost of commuting across the Bay Area and shows that our initial hypothesis may not be entirely wrong, but misguided. Instead of there only one economic center with short commutes, there are multiple. While central San Francisco is certainly a popular destination, with surrounding areas that have easy access to jobs, other areas of the Bay offer the same benefit. UC Berkeley is one center, where most likely students live around the campus and their “commutes” consist of walking to class. North of the city, Santa Rosa’s center has low commute costs, with higher commute costs around the periphery of that city. Note: Zones without any commute routes with more than 20 people were removed.

Wondering how these findings stacked up against rental price distribution, we plotted the roughly 100k rental listings we had. At first glance, it is easy to see that both prices and density of rental listings are denser in in San Francisco. Listings and their respective prices are fewer, and generally cheaper, further inland. A clear theme you can gather from the map is the geography of rents on the east versus west side of the Bay. While Marin, San Francisco and San Mateo counties appear largely unaffordable, rents in Alameda and Contra Costa counties are generally cheaper.

In order to combine our commute data and rental data, we plotted the median rent in each zone below. By flattening the rents, it is easier to see the cost difference between neighborhoods. This map further challenged our hypothesis: while lower rents may more common in the East Bay, the gradient of price wildly ranges depending on where the residence is. For example, ‘the flats’ near the bay in both Oakland and Berkeley are have significantly lower median rents than the hills of either city. Conversely, the map shows that there are relatively lower cost neighborhoods in San Francisco, although they are sparse.

To truly understand the interplay between rental price and commute time, we combined the two variables into an index. The index is between 0 to 100, wherein 100 represents the highest cost commute and rental listing in the dataset; other observations are placed in the index in relation to the highest. The area with the highest index juts into the bay from Marin, the lowest is a few block in Vallejo.

We find that there are small pockets of low rent, low commute neighborhoods, including areas of Vallejo and Oakland. There are also areas that appear affordable on the map that have relatively high commute times, particularly the region stretching from Martinez to Pittsburgh. These areas are deemed affordable by our index because commute costs, as we have estimated them, are lower overall than rent costs, and therefore weighted less heavily in our combined index.


Although we created a per commute minute cost using the time cost to a median earner, we knew that the data we had should reveal the preferences of renters, showing their intrinsic value of commuting (see 538 analysis of commutes in NYC ). To back out this revealed cost of an additional minute of commute, we ran regressions of commute times on rent with a number of controls. Initially, we built separate models using just driving time as the independent variable on rent, just transit time as the independent variable on rent, and the combined commute variable as the independent variable on rent. However, we found that these models were quite weak and noisy. Therefore, we created the four models below.

The four models all include the following control variables:

  • Number of Bedrooms
  • Square Feet
  • Whether the area is marked as a community of concern
  • The number of commuters in each commute pattern

Models 1-4

Only the fourth model includes a term for commute time squared, which shows the how the cost changes for each additional minute of commute.

We found that transportation had much more explanatory power than driving alone. For that reason, we use the combined variable for commute in model 3. The model leads us to believe that, for each additional minute of commute, rental prices fall by about $39. While model 4 is more difficult to interpret due to the quadratic, the coefficient on the squared term is positive, indicating that the farther you commute, each additional minute effectively decreases rent prices by less.


It was not surprising to see that the number of bedrooms significantly increased rental price; each additional bedrooms leads price to rise by about $650. However, it is worth noting that listings with more bedrooms are more commonly found outside the city center in less expensive areas, so the coefficient may be biased.


The community of concern tag was quite telling. An area is marked as a community of concern if it is largely minority, low-income, low English proficiency, elderly, zero vehicle households, disabled, rent-burdened, or single-parent (an area must meet four or more of these criteria). Rental prices drastically fall for areas that receive this flag, by roughly $450. And while the coefficient for commuter count may not be quite telling, it was a rough way to control for the population density.

While we know that the change is square feet is a significant predictor of rent, our initial results were weaker than expected. For every additional 1,000 square feet in a unit of housing, our initial regressions suggest that rent increases by only $10 per month. This seems far too low, yet it was stable. After looking more closely at the variable, we removed observations with square feet above the 99th percentile (approximately 3,000 square feet). The results below demonstrate a significant change in the square feet and bedroom coefficients (which are highly correlated), yet the commute coefficients are relatively stable.


Once we subsetted the data to capture observations with more accurate square footage, we re-ran the models and found much different results. We can now see square feet explains a much larger share of rental price increase. When using the model that includes the weighted, combined commute variable, we see that rent shoots up by roughly $1,700 for every extra 1000 square feet; this figure is far more believable that what the previous model predicted– if even a bit high. However, notice that the coefficients on bedrooms are now negative. Presumably, there was some multi-collinearity between the variable for bedrooms and the variable for square feet. Moving forward, this problem would warrant further inspection. Despite the changes on bedroom and square feet, though, the coefficients capturing commute times stayed rather stable, which is promising for purposes of answering our project’s question.


Overall, the commute cost appears to be higher than $20 per additional minute, which we estimated using median earnings. Our analysis suggests that renters value commute time at we $40 per one way minute per month. However, there are a number of limitations to our analysis, including a possibly biased dependent variable and omitted variables.

Limitations and Next Steps

Largely, this project was exploratory. We were curious to find what bearing, if any, commute times had on rental prices. To determine any causality, though, we would have to address limitations of our data and methodology.

The data that we used could be improved upon. The Craigslist data is from 2014; rental availability, prices, and density have likely changed since. Moreover, it may be biased: prices posted on Craigslist are at the behest of the poster– the price asked may not reflect the true value or the accepted price. Listings on Craigslist may also be rentals that have higher turnover than the average rental, marking the listings as systemically different. The data we relied upon for commuter patterns was simulated data, and warrants closer inspection. Also, commute times pulled from the Google Service are approximations and only reflective of one period of time (9am on a Wednesday morning).

Outside of the data, our methodology was geared towards understanding, at a glance, trends in the Bay Area. For example, we used median income for all nine counties of the Bay Area to calculate time costs; we would have to control to variation in income between zones to estimate true time costs. Relatedly, the index that accounted for commute and rental prices underestimates commutes according to the results of our regressions. While we estimated the time cost of commute to be around $20 per one way minute of commuting, our analysis shows that renters value commute time at nearly twice that. If the index was recreated with the results of the regression, areas with higher commutes would be shown as less affordable than in our map.

Finally, neither our index or our regressions takes into account other amenities associated with specific locations. While it is reasonable to assume that people prioritize rents and commutes during an apartment search, there are numerous other factors that determine residence. One that is notably absent from this analysis is school quality, which many families chose to prioritize over, or at least with, commute time or rent. Proximity to amenities or family, crime rates, or general environment also sway these decisions, and are more difficult to account for. By not including these other factors, we may be over- or underestimating the cost of commuting.

Lastly, there are important socioeconomic, ethical implications that our project does not capture. We found that the Bay Area does, in fact, have pockets of low-cost, low-rent areas. Yet, these areas are likely communities of concern. If median earners are continually pushed outside city centers, it will be at-risk communities who will be displaced.

An important next step to this work is to look at the populations most significantly impacted by these patterns. Moreover, more work should be done evaluating transportation options and further investigating how to create more affordable housing.






(1) In order to make more realistic commute destinations, centroids are weighted by rental listings. The difference between geographic centroids and weighted centroids is negligible in the more urban areas of the bay, but creates much more realistic commuting destinations in less dense districts in places like Marin or San Mateo counties.

(2) The Google Directions API also has a traffic option called “bestguess” but comparing some commute times to the web version of Google Directions, it appeared that it underestimated traffic the farther into the future the request was for. We felt it was important to incorporate pessimistic models of traffic in our model.

Transit and Traffic in the Bay Area

The last posts have interrogated the linkages what is and what is not accessible via public transit in metropolitan areas. But what is your local bus or train is out of commission, or simply unreliable? Whether it’s because Bay Area Rapid Transit (BART) is truly as terrible as some think, Californians will regardless rely heavily on their cars to get around; as of two years ago, a study found that annual driving per-capita is 150% higher in California than the rest of the United States.

The San Francisco Bay Area

The Bay Area is a large tract composed of nine counties. Although the Bay is often lumped into one geographic zone, the characteristics of each county vary heavily.

Source: Wine and Vine

The concentration of drivable roads varies. Below is a map of the Bay Area’s driving networks. As you can see, there is disparate concentration of drivable networks– specifically in Contra Costa, Alameda, and Santa Clara Counties.

Source: Nine Counties accessed via OSMNX

The darker the space on the map, the denser the concentration of driving networks. The concentration of streets is also indicative of were people are likely working and commuting.

Understanding Commute Choices

To parse this question, it is helpful to look at the Bay Area through a different lens: Traffic Analysis Zones (TAZ). TAZ are constructed by census block information; typically, they capture important information like the number of cars per household, income, and employment within each zone. However, with nearly 4000 TAZ in the Bay Area, the analysis quickly becomes confusing.

All TAZs

For this reason, it’s easier to understand the geographic make-up using a different classification called “super districts.”

Super districts
Super districts via TAZ data

This vantage point becomes clearer when looking at the super districts. The smaller districts are often more densely populated and have quite a bit of flow in between them every day. The Bay Area has attempted to account for the mass amount of commuting with an intricate and vast road network.

Source: CA Highways

However, density and accessibility may not be the issue. A map of just Alameda’s drivable network shows that roads and drivable options are not in short supply; in fact, the availability of roads may be encouraging more people to drive rather than use public transit or alternative methods because driving is essentially easier, and cheaper. As a measure of basic supply and demand, the more roads that are available, the cheaper driving becomes.

TAZ data overlaid with road networks accessed from MTC

Challenging the “More Supply, More Demand” Theory

But let’s pretend for a moment that the Bay Area functions as some sort of economic Upside-Down; what is the robustness of roads isn’t explaining the heavy reliance on cars and subsequent traffic?

If it’s true then that the availability of roads is the causing traffic, we also have to probe what public transit looks like. BART stations are located throughout the central part of the Bay Area: in Alameda, Contra Costa, and San Francisco Counties.

Berkeley and SF Open Data overlaid on TAZ

Vital Signs, the brain child of the Metropolitan Transportation Commission put together a basic breakdown of commute patterns in the Bay Area: they found that a majority of commuters from each one of these counties is traveling to one of the other three for work. So the supply is there, but where is the demand?

Using a buffer of about 60 miles, we can see that BART stations are accessible even outside of of their designated counties if people drove to the stations and parked their cars. However, only about 16% of commuters are using BART to commute to work (more often that not going to and from SF, Alameda, and Contra Costa Counties).

Source: TAZ and BART data

So the supply for BART, while it could certainly be better, is there- but the demand is not matching up.

Conversely, the supply for roads may not be as formidable as one might hope. Although the Bay Area has a wealth of roads, many of them are not so easily navigable. Pulling coordinates from the center of San Francisco show you the number of roads that lead to dead ends; many of which are only one way streets.

Source: Using OSMNX and coordinates from DT SF

Highlighted in red, you see roads that are characterized as “endpoints” and have no outlet. While the supply may be there, quality of driving in San Francisco leaves little to be desired.


What else is accessible along public transit?

In the last blog post, I explored access to transportation, employment, and education in New York City. As a new local to the Bay Area, I decided to explore what is available along the Bay Area Rapid Transit system (BART).

After looking at popular spots along the transit route, I found that cultural centers and hubs are often accessible via public transportation; specifically cannabis dispensaries.


Using SF Opendata, I was able to use an API to pull the locations of legally permissible marijuana clinics, most of which are located right off of BART. The data, which was geocoded was read through an API, put into a dictionary, mutated into a data frame, and then stored into a .csv, which I was able to read into CartoDB. One difficulty that I came across was cleaning the coordinates column, as the latitude/longitude were read as objects, versus strings- thus not enabling me to use the standard string cleaning data practices. However, I was able to use street addresses to circumnavigate that problem.


Given the location of the dispensaries, I wondered to whom the businesses were catered to. They obviously are catering to people who by-in-large don’t drive, but prefer to use or are more likely to use public transit. I also gleaned that by occupying spaces right along transit, business locales that are likely to be a bit more expensive to occupy, speaks to the priorities of the Bay Area, and the large amount of support and commerce that the dispensaries are likely to attract.

This was an interesting departure from what I expected to find, especially in juxtaposition to the kinds of things that are accessible off the NYC subway system (less culturally charged stations, more business/financial institutions).

Access to Education in New York

During my time in NYC, I found that, despite the large size of the Brooklyn borough, there were fewer transportation options. There was only one train that ran from Manhattan out to the remote location in Brooklyn where I worked. If the L Train was out of operation, I simply could not get to work- and that happened often. Even if the train were running, I would still have to take a bus from the last stop of the train to get to my school.

This is a pervasive problem in New York. You can see in the map below that, even though Manhattan is much smaller than the other boroughs, it has the highest density of train lines.

Looking more closely at this problem, I chose two maps that were in the CartoDB data library: NYC boroughs and NYC train lines. I overlayed these maps so the viewer can see which trains are going where in the city, as well as the density of trains in the different neighborhoods.

While this posed an inconvenience as a teacher, I had the (albeit expensive) option of taking a cab or ride share to work. My students, however, would not have that option. If the trains were down due to maintenance or weather, it was not unlikely that they wouldn’t be able to get to school at all.

Attendance was a grave concern at our school: only about 60-70% of students attended daily. I wondered how much of this was due to poor transportation.

The school in which I worked was at the Eastern most part of Brooklyn. As you can see from the above map, there is no train that links to that outpost. Attendance was a grave concern at our school: only about 60-70% of students attended daily. I wondered how much of this was due to poor transportation.

However, this problem not only plagues adolescent students. This problem of access begins quite early at the ore-kindergarten stage. Many parents rely on Pre-K services because it allows them to go to work and have their children looked after. The service is very beneficial to the children as well: students who attend Pre-K for at least a year prior to entering school perform much better.

This map below was made by overlaying data I gathered about pre-K locations in NYC on NYC borough data from the data library. The data set that I found came from NYC’s open data source tool.

The density of Pre-K centers is much higher in the small island of Manhattan than larger boroughs. Comparing this map to the subway map illuminates an ever more telling story: there are less schools where there are less trains, and vice versa.

What’s more is the concentration of school districts. The map above shows us that Manhattan is almost one district in of itself offering a range of transportation options to those in the district. Sadly, those who live in Manhattan are often middle to upper-class families.

By way of city planning, minority youth are being disadvantaged.

Less resourced families, often families of color, are in the plentitude of school districts located outside of Manhattan- meaning they also have lessened access to schools. By way of city planning, minority youth are being disadvantaged.

While these maps are easily digestible for anyone, their intended audience is policy makers and analysts. It is the hope that simple graphics such as these can illuminate how access to opportunity still plagues an otherwise modernized, developed country.

As troubling as these maps are, they are important. They tell the story about the biased concentration of resources in more affluent, often white, neighborhoods. The cycle of poverty is viciously linked: people cannot work because they have less access to Pre-K. Students who are not attending Pre-K will perform more poorly in school, and  have less access to schools. Their attendance rates will slip, as will there performance and graduation rates. They likely will not go to college, rendering them idle in the place that they grew up.

Policy should focus on opening access to the families who are otherwise stuck in these cycles. By providing more infrastructure (i.e. transportation to schools, Pre-K centers), the concentration of poverty and poverty itself is likely to dissipate.


Born and bred in the Southern California mountains, I have spent a majority of my lifetime outside. I continued this streak as undergraduate at Cal Poly San Luis Obispo, where I was a double major in history and political science.


Following college, I moved to New York City as a member of Teach For America. For two years, I taught high school global history in an under-resourced school in Brooklyn and fell deeply in love with education. While in New York, I received my Master’s in Education (M.Ed).

After two years in the City, I came back to the West Coast to get my Master’s of Public Policy (M.P.P.) with an emphasis on K-12 public education reform. Since, I have also become increasingly interested in housing policy. In the spring of last year, I did a feasibility analysis to test whether tiny houses can be used as a viable option to house Oregon’s homeless. 

I’m hoping to learn more about spatial analysis so to look closely at the linkages between subsidized housing and education outcomes.

Although my academic training is in research and quantitative methods, my real passions are longboarding, amateur windsurfing, climbing, hiking, and hanging out with dogs of all creeds and color.