Predictive Analytics and Cities

It’s been a big year for predictive analytics.  

I’ve been following Nate Silver’s blog on the election, and his deep data analysis cut through the noise, was consistent, and ultimately proved correct.  

And to look at another (eerily prescient) example, look at this 2006 prediction of what a major coastal storm could do to the East Coast.

We have lots and lots of data about what has happened, and we’re just starting to figure out how to use it.

Tomorrow, I’m attending a conference on Innovation and Cities at Harvard’s Kennedy School, and I’ll be speaking on a panel on predictive analytics and cities.  I’ll be joined by New York City’s Director of Analytics, Michael Flowers and Chicago’s (first ever) Chief Data Officer, Brett Goldstein.  Both Brett and Michael are way deeper on this subject than I am, so my hope is to simply ask some provocative questions, and perhaps give some examples from outside the civic sector.

A few weeks ago at the Ford Foundation’s Wired for Change conference, MIT’s Cesar Hidalgo gave a thought provoking talk on the power of big data and predictive analytics.  A big takeaway from his talk was that by looking at how data is connected — i.e., focusing on a few of data as a network, rather than as sums of numbers — we can quickly and compellingly start to see new trends, tell new stories, and predict future outcomes.

Cesar presented some research that looked at national exports in terms of connections between products and industries.  By creating such a “map” of the ecosystem, using historical data, it actually becomes relatively easy to guess which sectors will continue to grow and how.  For example, here is a look at South Korea’s export economy over time:

This simple, but profound, change in approach holds tons of potential for us to understand what’s going on in our cities and countries, and better prepare (for economic changes, natural disasters, etc.).  You can play with more visualizations of world economic data at MIT’s Observatory of Economic Complexity.

So, looking ahead to tomorrow’s conversation: the specific topic of conversation is:

Predictive analytics cut across issues and datasets. When it comes to potential new forms of analytics, what are the low-hanging fruit? What are ambitious, longer-term ideas of new ways to use predictive analytics to tackle urban issues? What could/should cities do together?

I have some ideas — for instance, generally taking an open data and open standards approach at the foundational level (to widen the audience of potential data miners).  Looking for data sets that tell us a lot about how the city works, but might not be the first ones we think of (such as taxi drop off locations, long-distance call originations, tweets, supermarket and other consumer spending data, etc.).  I’ll keep noodling on it today and tonight.

What do you think?

15 comments on “Predictive Analytics and Cities”

It sounds really small, but making the data formats standard and open across cities and countries is probably the most important thing we can do.

What gets measured matters. And what gets measured can be tweaked to improve over time. Hidalgo and Hausmann’s data on manufacturing is really interesting, but international trade is a small part of our economy here in the U.S. For big data to really make a difference in growth and government service delivery, we’ll have to do a lot more to measure inputs and outputs in huge sectors of the economy such as healthcare, transit, education, and finance…

Dan — great to see you!

Yes, I totally agree.

I didn’t include the Hidalgo stuff to necessarily point to exports as an important indicator, but rather just to show how linkages and visualizations are so powerful (Cesar did a great job explaining in his talk).

I am also really interested in what other data sets might have predictive value. For example, a friend just mentioned San Francisco’s SFPark program,
which tracks parking space occupancy (and provides an API on top of it). Very very cool project. I wonder what the equivalent opportunities are in some of the other spaces you mention. And I wonder what lower cost ways of unlocking them there might be (in the case of SFpark, the infrastructure is expensive, but the software and API could be replicated relatively inexpensively).

What about examples from private companies that are collecting data for a specific end user utility or value, but that data tells us alot about our worlds – our cities. Examples: Waze re traffic patterns, Asthmapolis (and others) re public health; Task Rabbit (re urban employment patterns)

yeah – I think it would be worth doing a matrix of urban issues vs. web companies working in that space (i.e., runkeeper and health) to see if there is potentially meaningful data exhaust that could conceivably be tapped.

Another major issue w all of this is that governments themselves keep tons of data but just aren’t necessarily making the best use of it yet.

Totally agree. is working on several code enforcement & life safety matrix and vertically integrated applications that leverage open public data with massive commercial datasets from Nokia/NAVTEQ. Nick, great point on the issue of empowering government with information dashboards vs actionable insights that can integrate into the municipal enterprise system. Great post all around and discussion.

I work for Pentaho. We have a number of open source technologies for transforming and and performing predictive analytics on all kinds of data.

Here are a few specific use cases I have come across:
* All kinds of capacity planning: This includes long-term (e.g. staffing based on population trends) and short-term (electricity based on historical hourly consumption per household).
* Emergency vehicle placement: Predict times/locations where accidents are most likely to occur and have emergency vehicles placed nearby instead of at facilities.
* Traffic flow: Optimize efficiency of public transport by analyzing detailed historical event data
* Weather related: Correlate historical weather data with accident/outage/water quality data to aid in planning during extreme and normal weather patterns
* Budget analysis: Find outliers in budget proposals to identify errors/omissions/illegal activity
* Crime prevention: Analysis of long term crime records to identify suitable prevention programs, correlate similar cases etc

This may not have enough of a predictive focus for you, but there’s a company called Revelstone that just completed the Code for America accelerator program. They’re aiming to build a platform that generates and tracks performance analytics for cities, and allows cities to compare benchmarks. You might want to reach out to them and see what metrics they’ve identified as the most relevant.

Nick – I work in communications at IBM. We have been very focused on the power of big data to provide insights that drive strategic decision making, particularly for cities. Take a look at our smarter cities page [] You might also check out a project called CityForward. This is a public Web site designed to allow users to capture, catalog and analyze public urban data. ( ) There is also terrific work being done by Carlo Ratti and his team at MIT in the SENSEable City Lab. Amazing urban data visualizations – Hope this helps! Break a leg tomorrow.

While it’s harder to quantify and analyze, I wish California would get smarter about using data to inform policy decisions rather than relying on costly and blunt ballot propositions every election cycle.

The fantastic accuracy of polling data in this week’s election makes me think we could be polling/surveying/otherwise taking people’s pulse on issues to inform policy priorities and decisions.

Comments are closed.