Sphereless: Dissecting the still-breathing body of "Open Data"

Benjamin Welby has a great 5-part series on Open Data, summarising a lot of the current state in local government, and raising some good questions about how local government can lead on a lot of this. His first article picks up on my "Open Data must die!" post, and got me thinking about the term itself again.

Why is this an interesting thought exercise? Maybe the names don't matter, and function is everything. But at the same time, names crystallise our shared understanding of why we're doing something.

So why do we use the term "Open"?

Partly legacy - open source, open access, open-not-closed. All good stuff.

But in the context of governance, what role does the term "open" play? In fact, if we look at the two aspects of a) service delivery and b) democratic engagement, why is openness important? Answering this question, I believe, is key to moving open data forward - by clarifying new purposes other than openness in its own right. By setting an agenda other than "being open" we can start to look at being smarter and more productive, but within a context of accountability and shared decision-making.

Service Delivery

Let's take a break for a moment and consider the commercial world of "open data" - the Flickrs and the Googles and the like. A lot of the push for open government data came from a shift in the web paradigm from "data-as-a-website" to "data-as-an-API". Suddenly control of the data opened up - an essential inevitability as websites moved to a social model of importing data from users more and more. The website became a store, rather than a presence in itself.

Openness in this commercial world meant (or means) that developers were no longer restricted to the path to access this data that the website builders had pre-determined. You could suddenly write your own slideshow script based on your (or someone else's) Flickr photo stream, for example. Or print it out. Or turn it into text.

To return to government services, we have two types of user though, and two types of data.

First, we need to differentiate between data used by individuals and data used by policy-makers.

Second, we need to compare data about individuals with data about systems. By systems, I mean any organisation or structure that can be seen as a "whole" - such as transport systems, finance systems, demographic systems and geographic areas, etc.

This gives us a basic 2-way matrix on to which we can position "data" (which, after all, is ubiquitous):

Box 3 is greyed out because it doesn't count, imho. Ignore it?

The commercial data mentioned above falls into box 1 - a user accessing their own data. This is largely ignored (or discussed in fairly esoteric circles) under the governmental "Open Data" banner because it's hard. It's probably also the most useful, but maybe one to come back to. People tend to think of health data and benefit information, but one could also include library data (reading histories, current books, etc) and so forth in here.

Box 2 - systemic data used by individuals - is where a lot of people would like to position Open Data, whether it fits or not. The idea of the "individual" accessing data appeals to our consumerist lifestyle, like governments fulfilling their public-good role for the ultimate satisfaction of the private citizen.

(The "Armchair Auditor" idea falls into this precise trap of supposing that individuals will hold Government to account. There is nothing more romantic than a single person overcoming Universal Might.)

Some of this data is actually really, really useful. Transport data, opening times, prices, etc - this is the realm of information that we all use on a day-to-day basis. But that kind of data is "small" information - we consume something very specific in very short bursts, such as "next bus from stop X", or "who is my councillor?". These could almost be considered "facts" about the world.

Other data is also relevant to individuals, but from more of a research perspective. Being able to present data differently, such as with TheyWorkForYou, or aggregating data up to make travel-time-vs-house-price maps, starts to address the notion of how we can interact with the data in a user-experience kind of way. But to engage, users must not only interact with the data, but also with the democratic process (see box 4). The boundary between these two seems relatively unexplored so far.

Box 3 - policy-makers accessing data about individuals. I'm actually going to call this a void box, because I think at this point we transmogrify into anecdotal evidence, rather than data. Policy-makers love using stories about individuals, far more than comparing their stats in a Top-Trumps-esque kind of way. Comment if you feel differently.

Box 4 - systemic data used by policy-makers - really leads us into the idea of "democratic engagement", because it's where the overarching power lies. This is the data/evidence used to make decisions at a higher level. It needs to be aggregated to resolve complexity (and gain understandability), but be reliable and resilient enough for decisions to be made confidently.

This box includes data about government activities, such as spending data, as well as data about geographical areas, such as the IMD and National Indicators and demographics and so on.

Opening up Box 4

Box 4 is the most interesting, because it hasn't really been done before. Or rather, not in an open sense. Plenty of companies use "systemic" data - or "big data" if you like - to explore trends and make decisions. But all that data is valuable and therefore closed.

This is where and why Open Data hits challenges: the systemic data in this box is fundamentally tied to how administrations work and think. That is, it represents a worldview upon which certain types of decision can be made - accountable ones, justifiable ones, trackable ones. Data here, whether it's financial, organisational, demographic statistics or what-have-you, has gone through a long history of hammering out. To be usable, it must be comparable, and to be comparable it must be rigourously defined, in terms of what's collected and what it means.

It's why we have National Indicators and the ESD and consultations on standards within government.

And this leads to a bit of a paradox, which the Open Data world is currently trying to grapple with:

First, the data needs to be understood, because it's so well defined.

Second, the data - even after being understood - is only useful to its original context (the policy-making hierarchy), even after being understood.

Both of these together form an interlocking puzzle: If people don't understand the data, how can they use it? And if not's useful to them in their context, why would they want to understand it?

(Additionally, this paradox is semi-locked inside a Japanese puzzle box. Even if you get both, do you necessarily have the power to influence decisions?)

Democratic Engagement

This gives us effectively a 3-way combo lock on Open Data, which immediately answers our second purpose - that of democratic engagement. There is a philosophical argument for democracy, and indeed a philosophical argument for "openness". But I would say we have that already - people want to engage, even if out of a sense of ownership ("I pay my taxes and vote, therefore I should have a say.").

But people are also busy and/or lazy - there's no philosophical drive to see how government works any more than I have a philosophical drive to see how my computer works while I'm using it. Rather, I look at source code when I want to change it or write my own. The philosophy of "openness" rapidly transforms into a reality of "effect". And it's this sense of being able to "effect" that the idea Open Data comes down to.

Next time you see a dataset being opened up, ask ye 3 questions of it:

Can I understand it?
Is it useful to me in my life?
Who's listening for answers that might lie within it?

Too often "Open" data ignores all 3 of these questions. "Raw data now" ignores these questions. You cannot separate data from policy and expect engagement.

"Open" is a mindset, not a piece of content.

"Open" cannot be an end in its own right.

Sphereless

Friday, April 27, 2012

Dissecting the still-breathing body of "Open Data"

No comments: